views:

1337

answers:

9

( if you get easily bored reading long posts, you can focus on the bold parts )

Hello all! This is not exactly a technical question, since I know C kind of enough to do the things I need to (I mean, in terms of not 'letting the language get in your way'), so this question is basically a 'what direction to take' question.

Situation is: I am currently taking an advanced algorithms course, and for the sake of 'growing up as programmers', I am required to use pure C to implement the practical assignments (it works well: pretty much any small mistake you make actually forces you to understand completely what you're doing in order to fix it). In the course of implementing, I obviously run into the problem of having to implement the 'basic' data structures from the ground up: actually not only linked lists, but also stacks, trees, et cetera.

I am focusing on lists in this topic because it's typically a structure I end up using a lot in the program, either as a 'main' structure or as a 'helper' structure for other bigger ones (for example, a hash tree that resolves conflicts by using a linked list).

This requires that the list stores elements of lots of different types. I am assuming here as a premise that I don't want to re-code the list for every type. So, I can come up with these alternatives:

  • Making a list of void pointers (kinda inelegant; harder to debug)
  • Making only one list, but having a union as 'element type', containing all element types I will use in the program (easier to debug; wastes space if elements are not all the same size)
  • Using a preprocessor macro to regenerate the code for every type, in the style of SGLIB, 'imitating' C++'s STL (creative solution; doesn't waste space; elements have the explicit type they actually are when they are returned; any change in list code can be really dramatic)
  • Your idea/solution

To make the question clear: which one of the above is best?

PS: Since I am basically in an academic context, I am also very interested in the view of people working with pure C out there in the industry. I understand that most pure C programmers are in the embedded devices area, where I don't think this kind of problem I am facing is common. However, if anyone out there knows how it's done 'in the real world', I would be very interested in your opinion.

Thanks in advance and sorry for the long long boring post =)

+5  A: 

My $.002:

  • Making a list of void pointers (kinda diselegant; harder to debug)

This isn't such a bad choice, IMHO, if you must write in C. You might add API methods to allow the application to supply a print() method for ease of debugging. Similar methods could be invoked when (e.g.) items get added to or removed from the list. (For linked lists, this is usually not necessary, but for more complex data structures -- hash tables, for example) -- it can sometimes be a lifesaver.)

  • Making only one list, but having a union as 'element type', containing all element types I will use in the program (easier to debug; wastes space if elements are not all the same size)

I would avoid this like the plague. (Well, you did ask.) Having a manually-configured, compile-time dependency from the data structure to its contained types is the worst of all worlds. Again, IMHO.

  • Using a preprocessor macro to regenerate the code for every type, in the style of SGLIB (sglib.sourceforge.net), 'imitating' C++'s STL (creative solution; doesn't waste space; elements have the explicit type they actually are when they are returned; any change in list code can be really dramatic)

Intriguing idea, but since I don't know SGLIB, I can't say much more than that.

  • Your idea/solution

I'd go with the first choice.

Dan Breslau
Yeah, I am also a little resistant to use the 'union' approach, but actually the compile-time dependency isn't a big big problem, since once the assignment is done the code won't have to be manteined.
Rafael Almeida
Also, the preprocessor 'trick' simply uses a macro with parameters that is the whole code of the list definition and operations. At compile time, the preprocessor replicates the code, changing the types where appropriate.
Rafael Almeida
PS: SGLIB's author, who seems to have some academic authority in generic programming, says that C++ templates are implemented with preprocessing, and that's where he got the idea from. Thanks for your contribution!!
Rafael Almeida
The 'void pointer' approach is what I used for a doubly-linked list system I wrote years ago (mostly 1988-9, updated 2002, 2008). There were complementary sets of cover functions for the 3 types that were relevant to me. So, the application used the type-specific cover functions, ...continued...
Jonathan Leffler
...continued...but underneath they all used the same handling code. This was partly necessary because the cover functions also provided an inter-language glue layer. But the approach meant I only had to debug the 'char pointer' (void pointers were not always available back in 1988) version once.
Jonathan Leffler
+3  A: 

I've done this in the past, in our code (which has since been converted to C++), and at the time, decided on the void* approach. I just did this for flexibility - we were almost always storing a pointer in the list anyways, and the simplicity of the solution, and usability of it outweighed (for me) the downsides to the other approaches.

That being said, there was one time where it caused some nasty bug that was difficult to debug, so it's definitely not a perfect solution. I think it's still the one I'd take, though, if I was doing this again now.

Reed Copsey
Thank you for your comment! The mere fact that someone uses this approach in production code is a very strong 'pro-void' argument.
Rafael Almeida
I'm pretty happy with the implementation I worked up, when I had to do this. It worked well for quite a few years, too. My main deciding factor was that I could do this with void* in <100 lines of code - and it was very clear to follow. Everything else seemed too complicated/unmaintainable.
Reed Copsey
Somebody must really hate void* :P Gotta love the neg. votes!
Reed Copsey
@Reed: yes, I frankly don't understand the point of voting down an answer that does exactly what I asked it to be in my second last paragraph - an example of how it's done in industry.
Rafael Almeida
Upvote to cancel an indefensible down. The polymorphic approach is a bookkeeping hassle, but it work and is very flexible.
dmckee
+17  A: 

A void * is a bit of a pain in a linked list since you have to manage it's allocation separately to the list itself. One approach I've used in the past is to have a 'variable sized' structure like:

typedef struct _tNode {
    struct _tNode *prev;
    struct _tNode *next;
    int payloadType;
    char payload[1];
} tNode;

Now I realize that doesn't look variable sized but let's allocate a structure thus:

typedef struct {
    char Name[30];
    char Addr[50];
} tPerson;
tNode *node = malloc (sizeof (tNode) - 1 + sizeof (tPerson));

Now you have a node that, for all intents and purposes, looks like this:

typedef struct _tNode {
    struct _tNode *prev;
    struct _tNode *next;
    int payloadType;
    char Name[30];
    char Addr[50];
} tNode;

or, in graphical form (where [n] means n bytes):

+----------------+
|    prev[4]     |
+----------------+
|    next[4]     |
+----------------+
| payloadType[4] |                
+----------------+                +----------+
|   payload[1]   | <- overlap ->  | Name[30] |
+----------------+                +----------+
                                  | Addr[50] |
                                  +----------+

That is, assuming you know how to address the payload correctly. This can be done as follows:

node->prev = NULL;
node->next = NULL;
node->payloadType = PLTYP_PERSON;
tPerson *person = &(node->payload); // cast for easy changes to payload.
strcpy (person->Name, "Bob Smith");
strcpy (person->Addr, "7 Station St");

That cast line simply casts the address of the payload character (in the tNode type) to be an address of the actual tPerson payload type.

Using this method, you can carry any payload type you want in a node, even different payload types in each node, without the wasted space of a union. This wastage can be seen with the following:

union {
    int x;
    char y[100];
} u;

where 96 bytes are wasted every time you store an integer type in the list (for a 4-byte integer).

The payload type in the tNode allows you to easily detect what type of payload this node is carrying, so your code can decide how to process it. You can use something along the lines of:

#define PAYLOAD_UNKNOWN     0
#define PAYLOAD_MANAGER     1
#define PAYLOAD_EMPLOYEE    2
#define PAYLOAD_CONTRACTOR  3

or (probably better):

typedef enum {
    PAYLOAD_UNKNOWN,
    PAYLOAD_MANAGER,
    PAYLOAD_EMPLOYEE,
    PAYLOAD_CONTRACTOR
} tPayLoad;
paxdiablo
No idea why this is downvoted three tines, gotta love SO timetimes. This is actually the best answer, tho add macro's to make allocating/accessing the data totally generic (e.g. tPerson* person = LIST_GET_DATA(pNode, tPerson)
Andrew Grant
omg - 4 downvotes?
Andrew Grant
Hmm. This is a very very interesting idea. But I'm seeing a problem: It seems I have to know the payload type size at allocation-time.
Rafael Almeida
I can't think of any way to write an 'insert in list' method that accomplishes (a) work for all types and (b) know the size of the type added. If there's a way to do it, I'd love to know =)
Rafael Almeida
@Rafael, you have to know it at some point, allocation time is the absolute latest time you can. All other solutions will have this 'problem' unless you specify a maximum size (which still has to be known) and that approach causes wasted memory.
paxdiablo
By the way I (obviously - low rep) didn't vote your solution down and I never would. In fact, I love it! Even though I'm not sure yet if I'll use it (our assignments are documented, and I'd have to explain this all), I'll keep this approach in mind for future use. Thanks a lot!
Rafael Almeida
And the code inserting the item does know the size (since it must have a type).
paxdiablo
No probs, Rafael. Glad I could help.
paxdiablo
@Rafael good luck developing a solution that allows you to allocate memory for something you don't know the size of....
Andrew Grant
@Andrew, Pax: actually I figured out how dumb the size issue was right after I posted lol. My 'insert' method could easily have the size as a parameter, I just can't access the size to the main structure directly from the pointer. Guess i'm on a OO-mindset still LOL
Rafael Almeida
Again with the indefensible down votes. This is another pain in the butt, but also a working solution. Nice idiom for the allocation, too.
dmckee
Accepting this answer for creativity. Thanks to everybody in the discussion though!
Rafael Almeida
Allocating and using extra space after a final array-of-1 field is colloquially known as the Struct Hack, and causes Undefined Behaviour (although it "works" on many implementations). Also, you have no guarantee from the language that 'payload[]' is correctly aligned to store anything but a 'char' - more Undefined Behaviour.
mlp
I disagree with your first point, @mip, the behavior is not undefined as per the standard - malloc *will* allocate the right amount of memory and the cast *will* work. You're correct on the alignment issue but it doesn't matter for the example (which has char[] as the payload) and you can change the "pre-type" of the payload to int[1] (or some other suitable type) and fix the subtraction in the malloc, if you want to ensure alignment is okay.
paxdiablo
Actually, the real problem is that`_tNode::payload` isn't guaranteed to be at `sizeof(_tNode)-1`, because the compiler can add padding at the end of the struct to make alignment work. What you could do instead is not have a payload pointer at all, and just use a macro to always refer to the payload as being at `pointer + sizeof(_tNode) bytes` (which is to say `(payload_type*) (pointer + 1)` where `pointer` is of type `_tNode*`)
Ken Bloom
+2  A: 

I haven't coded C in years but GLib claims to provide "a large set of utility functions for strings and common data structures", among which are linked lists.

Sean McSomething
A great tip. Actually I wouldn't use on this right project, simply because I won't have the opportunity to talk to the professor in person before the deadline. However, it seems to be a very very useful library, and I'll keep it bookmarked for reference. Vote up!
Rafael Almeida
+1  A: 

This is a good problem. There are two solutions I like:

  • Dave Hanson's C Interfaces and Implementations uses a list of void * pointers, which is good enough for me.

  • For my students, I wrote an awk script to generate type-specific list functions. Compared to preprocessor macros, it requires an extra build step, but the operation of the system is much more transparent to programmers without a lot of experience. And it really helps make the case for parametric polymorphism, which they see later in their curriculum.

    Here's what one set of functions looks like:

    int      lengthEL (Explist *l);
    Exp*     nthEL    (Explist *l, unsigned n);
    Explist *mkEL     (Exp *hd, Explist *tl);
    

    The awk script is a 150-line horror; it searches C code for typedefs and generates a set of list functions for each one. It's very old; I could probably do better now :-)

I wouldn't give a list of unions the time of day (or space on my hard drive). It's not safe, and it's not extensible, so you may as well just use void * and be done with it.

Norman Ramsey
Thanks a lot for the information. A book implementation is certainly a quotable, verifiable argument. Also, using awk to create the specific functions is an interesting approach, which I haven't though of mainly because I'm not familiar with heavy data processing.
Rafael Almeida
Actually, I am a bit resistant to use this approach (no matter using cpp or awk, or whatever), because it creates a bit of a naming hassle, since C has no namespaces. How do you approach this issue?
Rafael Almeida
I tackle namespaces the same way Hanson does: by programming convention. In this case each list function incorporates the name of the element type or an abbreviation for that name. I've added an example.
Norman Ramsey
"The awk script is a 150-line horror" - most 150-line awk scripts are :-)
paxdiablo
+1  A: 

Although It's tempting to think about solving this kind of problem using the techniques of another language, say, generics, in practice it's rarely a win. There are probably some canned solutions that get it right most of the time (and tell you in their documentation when they get it wrong), using that might miss the point of the assignment, So i'd think twice about it. For a very few number of cases, It might be feasable to roll your own, but for a project of any reasonable size, Its not likely to be worth the debugging effort.

Rather, When programming in language x, you should use the idioms of language x. Don't write java when you're using python. Don't write C when you're using scheme. Don't write C++ when you're using C99.

Myself, I'd probably end up using something like Pax's suggestion, but actually use a union of char[1] and void* and int, to make the common cases convenient (and an enumed type flag)

(I'd also probably end up implementing a fibonacci tree, just cause that sounds neat, and you can only implement RB Trees so many times before it loses it's flavor, even if that is better for the common cases it'd be used for.)

edit: based on your comment, it looks like you've got a pretty good case for using a canned solution. If your instructor allows it, and the syntax it offers feels comfortable, give it a whirl.

TokenMacGuy
While advice to use a more expressive language is all well and good, this assignment teaches what those other languages are doing under the hood. That's the point.
dmckee
@dmckee I agree. I don't think my response advises that way. The first two paragraphs are sort of a long-winded 'you can write FORTRAN in any language' in reverse.
TokenMacGuy
Ummm...Yeah. I retract. Sorry.
dmckee
@tokenmacguy: I completely agree with you that 'imitating' aspects of another language is not a good approach at all. However, please notice that I only mention C++ when talking about the SGLIB, and just because its author mentions it.
Rafael Almeida
@dmckee: Actually, I'd like to clarify that the purpose of the assignment is NOT 'implementing a multi-purpose linked list'. It's actually a organic search query processor that has to efficiently indicate groups of terms that occur together often.
Rafael Almeida
Later in the course, I'll be seeing graph problems, heuristics, approximate algorithms and so on. I only mention this because it shows that I'll have lots to worry about, and this list/container structure discussion is meant to 'get it out of my way', so I can worry about 'the big thing'. =)
Rafael Almeida
A: 

One improvement over making it a list of void* would be making it a list of structs that contain a void* and some meta-data about what the void* points to, including its type, size, etc.

Other ideas: embed a Perl or Lisp interpreter.

Or go halfway: link with the Perl library and make it a list of Perl SVs or something.

skiphoppy
Yes, the idea about putting the void inside a struct is nice, and actually 'softens' a little bit of the 'roughness' of using void*. Embedding an interpreter just to make container structures more generic sounds like overkill to me. (continued...)
Rafael Almeida
Plus, I don't know if there are restrictions to use code you didn't create: this applies even for libraries, let say for a full interpreter. Anyway, we're advised to be as simple as possible, and using an external interpreter is simply out of proportions.
Rafael Almeida
Yeah, I was just trying to get creative. :)
skiphoppy
A: 

I'd probably go with the void* approach myself, but it occurred to me that you could store your data as XML. Then the list can just have a char* for data (which you would parse on demand for whatever sub elements you need)....

dicroce
It is a valid idea, but I don't think it suits because the data I will be storing with some of the listed types may not be easily expressed with XML. Anyway, it's kind of oversofisticated to the problem.
Rafael Almeida
+2  A: 

Using a preprocessor macro is the best option. The Linux kernel linked list is a excellent a eficient implementation of a circularly-linked list in C. Is very portable and easy to use. Here a standalone version of linux kernel 2.6.29 list.h header.

The FreeBSD/OpenBSD sys/queue is another good option for a generic macro based linked list

Lear
A "list-in-structure" like the linux kernel version (instead of "structure-in-list" as is mostly taught) is the correct solution in the long run. No other solution offers the same level of generality.
eloj