tags:

views:

634

answers:

8

[To any mod considering deleting this question - please don't. Either my or Steve Jessop's code may well be helpful to reference from questions regarding reading text files using C.]

This question has generated the usual amount of sound and fury re the relative merits of C and C++. But something that never seems to get introduced into these discussions is actual code. I thought I would change that, so I set myself a small test:

Create a command line utility that reads zero or more files into memory and then outputs them in a random order. Each line must be output only once. The program to be written in C++98 or C89, using only their standard library functions.

The result is the two programs I present as answers below. I'd be grateful if you would vote on the one you would rather be tasked with maintaining.

Some notes:

  • The code is written in my own idiomatic style. The only difference being that there are no comments and less vertical whitespace than I would normally use.

  • The random number generator is naive, but is the same in both cases. so it dodn't seem worth improving at this point.

  • The C program reads only lines up to a fixed length - I couldn't be bothered implementting a true variable length file reading function. It would probably have added about 10 more lines to the code.

  • There is no check for memory exhaustion in either version.


Ah well, I thought it would get closed fairly quickly. Still, I enjoyed writing the code and seeng other people's code too - it seems to me that SO is lacking in that regard. Thanks to all who took part.

+3  A: 

This is the C version - vote if you would like to maintain this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#define START_LINES     100
#define MAX_LINE_LEN    1000

typedef struct {
    unsigned int mUsed, mSize;
    char ** mLines;
}  LineList;

LineList * MakeLineList( unsigned int nlines ) {
    LineList * ll = malloc( sizeof( LineList ) );
    ll->mUsed = 0;
    ll->mSize = nlines;
    ll->mLines = malloc( sizeof( char* ) * nlines );
    return ll;
}

void FreeLineList( LineList * ll ) {
    unsigned int i;
    for ( i = 0; i < ll->mUsed; i++ ) {
        free( ll->mLines[i] );
    }
    free( ll->mLines );
    free( ll );
}

void GrowLineList( LineList ** ll ) {
    LineList * newll = MakeLineList( (*ll)->mSize * 2 );
    unsigned int i;
    for ( i = 0; i < (*ll)->mUsed; i++ ) {
        newll->mLines[ newll->mUsed++ ] = (*ll)->mLines[ i ];
        (*ll)->mLines[ i ] = NULL;
    }
    FreeLineList( * ll );
    * ll = newll;
}

void AddLine( LineList ** ll, char * line  ) {
    if ( (*ll)->mUsed >= (*ll)->mSize ) {
        GrowLineList( ll );
    }
    (*ll)->mLines[ (*ll)->mUsed++ ] = line;
}

char * ReadLine( FILE * f ) {
    char tmp[ MAX_LINE_LEN ];
    if ( fgets( tmp, MAX_LINE_LEN, f ) == NULL ) {
        return NULL;
    }
    else {
        char * line = malloc( strlen( tmp ) + 1 );
        strcpy( line, tmp );
        return line;
    }
}

void AddFileContents( LineList ** ll, char * filename ) {
    FILE * f = fopen( filename, "r" );
    char * line;
    if ( f == NULL ) {
        fprintf( stderr, "file open failed: %s\n", filename );
        exit( 1 );
    }
    while( (line = ReadLine( f )) != NULL ) {
        AddLine( ll, line );
    }
    fclose( f );
}


int GetRand( int n ) {
    return rand() % n;
}

void Swap( char ** a, char ** b ) {
    char * tmp = * a;
    * a = * b;
    * b = tmp;
}

void Deal( LineList * ll ) {
    unsigned int i;
    for (i = 0; i < ll->mUsed; i++ ) {
        unsigned int r = GetRand( ll->mUsed - i );
        fprintf( stdout, "%s", ll->mLines[r] );
        Swap( &(ll->mLines[r]), &(ll->mLines[ (ll->mUsed - i) - 1 ]) );
    }
}

int main( int argc, char *argv[] ) {
    int i;
    LineList * ll = MakeLineList( START_LINES );
    for ( i = 1; i < argc; i++ ) {
        AddFileContents( &ll, argv[i] );
    }
    srand( time(0) );
    Deal( ll );
    FreeLineList( ll );
    return 0;
}
anon
Is this really a fair comparison? Because this looks like C written by someone who'd rather be writing C++. The unusual naming conventions are a dead giveaway. I mean, why arbitrarily choose C and C++ for this comparison. I'd much rather maintain code that does this sort of thing in a language like Ruby or Perl. And if this were C code I encountered on the job, I'd wonder why we weren't using some sort of library like glib to provide typical data structures like linked lists.
mcl
I agree with mcl. However, I am voting the C++ for obvious reasons, but there's beauty in the C as well. What you are missing in the C is a division of the LineList related stuff into an independent file, which would provide what, in the end, resembles a class. While the C++ code _hides_ all this stuff and makes it easy for you behind a framework for OO programming, it also produces much more complex code to debug and less clear error messages.With C you always know what's going on. With C++, not so much.
Stefano Borini
Firstly, it's a very normal naming style (vide the Windows API, for example) and one I've used since I started C programming in 1984. Secondly, did you read the question? I am particularly interested in comparing C and C++. And thirdly, anyone that blindly uses linked lists wants their head examining - I've fulminated more on this in my blog at http://punchlet.wordpress.com/
anon
+1 this is my preference, I like to see the details of the program as i work with it
Fire Crow
@Fire Crow Don't you miss seeing the inner workings of malloc() and free() then?
anon
mcl
@mcl: the naming convention Neil uses in his C++ code is also Windows-style, and does not match the C++ standard library's approach to naming functions and types. I don't think the names should affect the answer: they're either both right or both wrong, as you prefer.
Steve Jessop
If you insist on writing everything from scratch, then I'd prefer x86 assembly. The reason to use C (and not a better language) is that there are a billion libraries available. Can you post a version using Glib for us to vote on?
Ken
I don't see why you say "someone who'd rather be writing C++". Just because he uses well-named types and functions to create and operate on them? This is basic encapsulation and abstraction. This is good C code.
Mike Weller
+19  A: 

This is the C++ version - vote if you would like to maintain this:

#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <ctime>
#include <cstdlib>
using namespace std;

typedef vector<string> LineList;

void AddFileContents( LineList & lines, const string & filename ) {
    ifstream ifs( filename.c_str() );
    if ( ! ifs.is_open() ) {
        cerr << "Cannot open file: " << filename << endl;
        exit(1);
    }
    string line;
    while( getline( ifs, line ) ) {
        lines.push_back( line );
    }
}

int GetRand( int n ) {
    return rand() % n;
}

void Deal( LineList & lines ) {
    for ( unsigned int i = 0; i < lines.size(); i++ ) {
        unsigned int r = GetRand( lines.size() - i );
        cout << lines[r] << "\n";
        swap( lines[r], lines[ (lines.size() - i) - 1 ]);
    }
}

int main( int argc, char *argv[] ) {
    LineList lines;
    for ( int i = 1; i < argc; i++ ) {
        AddFileContents( lines, argv[i] );
    }
    srand( time(0) );
    Deal( lines );
}
anon
+28  A: 

Here's what I'd prefer (one might add error checking for file opening, but I don't think this is strictly necessary, if you don't care):

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
#include <ctime>
#include <cstdlib>

int main(int argc, char* argv[])
{
    std::srand(std::time(0));
    std::vector<std::string> lines;
    for (int i = 1; i != argc; ++i) {
        std::ifstream fin(argv[i]);
        std::string line;
        while (std::getline(fin, line)) {
            lines.push_back(line);
        }
    }
    std::random_shuffle(lines.begin(), lines.end());
    std::copy(lines.begin(), lines.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
}
UncleBens
Yes, I was going to use random_shuffle, but I thought it made me look too much like a smart arse :-) The point of the exercise is not to demonstrate standard library cleverness, but produce some code that the average programmer migght produce.
anon
@Neil: I don't think using library functions would make you a smart arse. From a maintainence point of view I'd much prefer this - the call to `random_shuffle` is clearly correct (assuming a decent library implementation), while your manual shuffle needs to be read and tested quite carefully.
Mike Seymour
Excellent solution, but little bit unfair for general comparison between languages :o) (what if assignement changes so there's no std algorithm?).
MaR
+2  A: 

One thing I struggle a lot with is trying to figure out how much of the code we consider difficult is actually difficult, or is simply different than what we are used to. At this writing, there are 19,954 questions tagged c++, and 8,432 questions tagged c, indicating that C++ has gotten a lot more activity than C, so I expect a lot more familiarity with it, and a lot of the positive responses (I fully expect the C++ version will "win") will be due more to that familiarity than to better actual maintainability.

My preference is for the C version, which I'd justify by saying I prefer the implementations of the libraries to be more obvious, so that if bugs do show up I'll be able to very quickly drill down to where the bug is. But, really, it's probably because I work with a lot more C than C++.

Aidan Cully
A lot of the questions tagged C++ are actually about C. The number of people using C++ to it's full potential is probably quite small. Regarding bugs, I'd rather not have them in the first place, which is where C++ shines - I actually spent an hour debugging the C version - the C++ version worked first time.
anon
I think this will largely be because C++ has a much bigger standard library than C, and the STL generally works. I don't know of a good C equivalent to the STL, but I'd be curious how well your effort would have gone had you not used the STL, but tried to implement the functionality necessary for this problem yourself? Also, if you work a lot more with C++ than C, that would also make you more productive in that language.
Aidan Cully
You can't separate the library from the language - if you did, C programmers would all have to write their own versions of malloc() and free(). As for familiarity, I don't write much C code these days, but I've had to maintain a lot. Over the years, I've probably been paid to write/maintain about as much C as I have C++.
anon
You absolutely can separate library and language, it just isn't done very often. There are a large number of malloc() / free() implementations with different semantics than the standard's out there, which don't get much use because the existing popularity of the standard is self-perpetuating due to network effects. The Jane Street guys kind-of get away with producing their own version of OCaml's core library because OCaml is such a niche language, that the standard isn't popular enough to affect them.
Aidan Cully
I'll also argue that, if you want to compare languages, you have to compare languages, not libraries. Haskell is totally sweet not, primarily, because it has a totally sweet standard library, but because the language lends itself so well to the production of a totally sweet standard library.
Aidan Cully
Nothing to do with this thread, but my pet peeve with Haskell (learning it is one of my long-term projects) is the f*cking syntax!
anon
+3  A: 

These language comparisons based on specific code for a given problem are totally useless. For any specific problem there are a gazillion possible solutions in one language alone -- beginning with the choice of algorithm and not ending with code and library layout, variable names, etc. There are all kind of solutions, from abysmal ugly and inefficient in space and time to godly excellent and lightning fast.

What will the result of this survey tell you? Are you sure if you're comparing a good C solution with a bad C++ solution, the other way around, or something average? Have you even compiled and tested both solutions, are they working as expected? Do they at least implement the exact same algorithm?

Secure
To answer your questions a) what the general feel of the SO C/C++ community is, b) Pretty sure - I'd rate myself as a fairly expert programmer in both languages, c) Yes, of course d) More or less, yes.
anon
I would think a prerequisite for "implementing the same algorithm" would be to return the same output given the same input, but unless I'm missing something, they don't even do the same thing: I don't see MAX_LINE_LEN anywhere in the C++ version.
Ken
@Ken Read the notes in the question.
anon
Neil: I did ... was there something in there about the programs doing different things that I missed?
Ken
@Ken: "The C program reads only lines up to a fixed length - I couldn't be bothered implementting a true variable length file reading function. It would probably have added about 10 more lines to the code."
Roger Pate
+4  A: 

I would by far prefer to maintain the C++ version. But then again, I am a C++ programmer. And so this question is missing an important point.

The C version is verbose and very explicit about everything that happens. That means there's more code to maintain, and more work for those of us who can read both versions.

But what if you know neither language to begin with? It doesn't take many minutes of tutoring before you can understand the C code because it is so verbose and simple and explicit.

The C++ version can potentially become a lot shorter and more concise, making it easier to maintain for a C++ programmer, but also filling it with gotchas and things that aren't immediately obvious if you're not a C++ programmer.

So for some purposes, the C version is more maintainable. That's why I argued against you in the other question. There's no doubt that you (Neil) and I, and all the usual answerers of C++ questions here on SO would by far prefer to maintain the C++ versions. But people who are new to C++ wouldn't necessarily agree. Short and concise code is nice if you already understand the language. But it also presents fewer hooks for understanding what's actually happening under the hood. The C version is much simpler, in that everything that happens is clearly spelled out when it happens. After you allocate memory with malloc, you initialize it, one field at a time, right there, for all to see. In C++, you just new something, and hey, somehow the right constructor (because of course, there's often more than one in a class) magically takes care of it.

jalf
It seems our minds will never meet on this. From your point of view, the BASIC program 10 INPUT A$ 20 Print "hello " + A$ is more complex than the equivalent C version. But surely it is much, much simpler to learn to write (and maintain) simple BASIC programs than it is to learn to write the equivalent code in C.
anon
+1. Another point is that the size and complexity of the C++ standard library makes it a more difficult language to learn than C. Once you do learn it, though, you're starting from a much higher base than does a C programmer.
Aidan Cully
@Neil: No, because in the above BASIC program, you never really need to understand what goes on under the hood. It's not part of the language. The language simply has some nice convenient input/print primitives. In C++, we can't rely on language-provided magic primitives. All the complexities **exist**, just like in the C program. They're just packed away a bit inside classes, constructors, assignment operators, destructors and all the other fluff. But to maintain the program, you have to understand these things.
jalf
In C you also have to understand these things, but they're there for everyone to see, because they're spelled out explicitly all over the place. The BASIC program can be maintained without understanding how INPUT is implemented in the first place
jalf
but keep in mind that I'm not universally saying that "C is more maintainable than C++". Far from it. I'm just pointing out that there are cases where the straightforwardness of C is an advantage. Take the Linux kernel as an example. Because it is all very plain C code, some maintainers have pretty much picked up the language by looking at the Linux source code. That'd be hard to do if it had all been written as modern C++. The code would be much leaner and simpler *for you or me*, but people wouldn't be able to learn it just by reading the source code. They'd need to learn the language first
jalf
@jalf I disagree VERY STRONGLY. I personally have NO knowledge of what (for example) the string, stream, vector etc. classes do "under the hood" - I've never looked, I really, really don't care. They are just as magic to me as INPUT is to a BASIC programmer. All I go on is their specification.
anon
That's exactly my point. There is a specification that you have to be familiar with. You don't need to look at the source code for `vector`, but to use it safely you **do** have to know the semantics for it. You have to know what requirements it places on objects stored in it for example (such as the CopyConstructible requirement). You need to know when and how it copies elements, when and how it calls the destructor (take everyone's favorite beginners question, "does a `vector<int*>` call `delete` on the pointers?" for example.
jalf
@jalf How is that different from free()'ing an array in C? In both cases you have to know what you are doing, but in C++ you need to know (and write) less.
anon
I'd say the difference is that in C you have to understand quite a simple interface (malloc/free). In C++ you have to understand quite a complex, and much more powerful interface (`std::vector<T>`, or for real power-users `std::vector<T,Alloc>`). Obviously I understand both fairly well, but I'd be less surprised to encounter a property of vector that I was unaware of, than a property of malloc I was unaware of, because the specification for vector is something like 3 pages long (plus the defintions of Sequence and so on). Learning vector and learning malloc are tasks of different magnitude.
Steve Jessop
@Neil: The difference is that in C, you know nothing gets freed unless you do it yourself. All the `free` calls are right there for you to see. There's virtually nothing left implicit. In C++, almost everything is implicit. We *know* that when the vector goes out of scope, such and such happens. When it is copied, such and such happen. When `push_back` is called, it behaves like this. The copy constructors for contained elements are called in these situations. You need a lot of prior knowledge to work with C++ code. In C, for better or worse, everything is spelled out in detail in the code.
jalf
A: 

To answer in a more general way: I'd prefer maintaining clean C++ code over clean C code, but if the C++ code was written by some "design patterns lover", I'll take the C code any day. Well, unless it abuses macros.

Nemanja Trifunovic
What have you got against design patterns? I agree that you shouldn't use them if you don't understand how they work and how to use them appropriately, but that doesn't mean that they're intrinsically bad.
Robert Harvey
Have you ever seen the C code you get from overapplying design patterns in C? heck, have you seen the C code you get with reasonable use of design patterns? Really, your hesitation about overzealous use of design patterns is in no way a reflection on the C++ language.
MSalters
@Robert Harvey. They are maybe not intrinsically bad in i.e. SmallTalk, but in C++ I have never seen a good use of design patterns. C++ code samples in the GoF book are definitelly not good C++.
Nemanja Trifunovic
@ MSalters: No I have never seen any C code using design patterns (as defined by GoF), which is exactly why I said I would rather maintain C than "design pattern based" C++.Having said that, I do prefer C++ to C. In fact, even when I write procedural code I use C++ "as a better C".
Nemanja Trifunovic
+2  A: 

Alternative C version: I think Neil's LineList is unnecessarily complex. There's still no checking for out of memory, which is something I'd never leave in practice, but I handle arbitrary length lines. I do prefer UncleBens' C++ to this, and the difference is almost entirely down to C++'s superior standard library. In general, C++ has language-based advantages handling resources, but in this case freeing what was allocated isn't really a big deal.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>

#define START_LINES     100

typedef struct {
    unsigned int mUsed, mSize;
    char ** mLines;
} StringList;

void initList(StringList *list, unsigned int nlines ) {
    list->mLines = malloc( sizeof( char* ) * nlines );
    list->mUsed = 0;
    list->mSize = nlines;
}

void freeList(StringList *list) {
    free(list->mLines);
}

void freeContents(StringList *list) {
    unsigned int i;
    for (i = 0; i < list->mUsed; ++i) {
        free(list->mLines[i]);
    }
}

void growList(StringList *list) {
    list->mSize *= 2;
    /* not checking for failure sure makes realloc easy to use... */
    list->mList = realloc(list->mlist, sizeof(char *) * list->mSize);
}

void addString(StringList *list, char *line) {
    if ( list->mUsed >= list->mSize ) {
        growList(list);
    }
    list->mLines[list->mUsed++] = line;
}

char *readFile(FILE *f, StringList *lines) {
    long size;
    char *result, *tmp;
    int redd;

    fseek(f, 0, SEEK_END);
    size = ftell(f);
    fseek(f, 0, SEEK_SET);

    result = malloc(size+1);
    tmp = result;
    while (size > 0) {
        redd = fread(tmp, 1, size, f);
        /* error-checking required */
        tmp += redd;
        size -= redd;
    }
    {
        long i;
        size = tmp - result;
        tmp = result;
        for (i = 0; i < size; ++i) {
            if (result[i] == `\n`) {
                result[i] = 0;
                addLine(lines, tmp);
                tmp = result + i + 1;
            } else if (result[i] == `\r`) {
                result[i] = 0;
            }
        }
        /* nasty, without testing there's a high chance this is wrong */
        if (tmp != result + size) {
            result[size] = 0;
            addLine(lines, tmp);
        }
     }
     return result;
}

void AddFileContents( StringList *linelist, StringList *filelist, char * filename ) {
    FILE * f = fopen( filename, "r" );
    char * line, *file;
    if ( f == NULL ) {
        fprintf( stderr, "file open failed: %s\n", filename );
        exit( 1 );
    }
    addLine(filelist, readFile(f, linelist))
    fclose( f );
}

int GetRand( int n ) {
    return rand() % n;
}

void Swap( char ** a, char ** b ) {
    char * tmp = * a;
    * a = * b;
    * b = tmp;
}

void Deal( StringList * ll ) {
    unsigned int i;
    for (i = 0; i < ll->mUsed; i++ ) {
        unsigned int r = GetRand( ll->mUsed - i );
        fprintf( stdout, "%s\n", ll->mLines[r] );
        Swap( &(ll->mLines[r]), &(ll->mLines[ (ll->mUsed - i) - 1 ]) );
    }
}

int main( int argc, char *argv[] ) {
    int i;
    StringList lines;
    StringList files;

    initList(&lines, START_LINES);
    initList(&files, argc);

    for ( i = 1; i < argc; i++ ) {
        AddFileContents( &lines, &files, argv[i] );
    }
    srand( time(0) );
    Deal( &lines );

    freeContents(&files);
    freeList(&files);
    freeList( &lines );
    return 0;
}
Steve Jessop