tags:

views:

260

answers:

6

In Eric Raymond's essay The Cathedral and the Bazaar, he states this oft-cited principle:

Smart data structures and dumb code works a lot better than the other way around.

Could anyone offer a strong example of this in either a procedural or object oriented language?

+4  A: 

The C++ STL is a good example of this.

Algorithms are built with a minimal set of requirements that containers (and the values they contain) must meet. How they meet those requirements is mostly unimportant to the algorithms, allowing the data structures to be as smart as they need to be in order to meet those requirements. From there the algorithm (dumbly) churns away, and it can be used in a wide array of cases over literally countless types of data structures.

fbrereto
While the question did explicitly mention object oriented languages, I don't think encapsulation is what the asker was aiming for.
Jurily
@Jurily: They were asking for examples of smart data structures and dumb code (which I interpreted to mean algorithms) for which I believe the STL is a good fit.
fbrereto
I think it's a very good point that classic interface and algorithm design is an example of this principle.
metatheorem
+1  A: 

The classic example is the remove operation from a doubly linked versus singly linked list (not that we have much cause to worry about that much any more, but it illustrates the principal).

In a doubly linked list (with a dummy node so that the list always has at least one node) you can remove a node with just a reference to that node:

node.Prev.Next = node.Next;
node.Next.Prev = node.Prev;

Contrast this to the same operation in a singly linked list, which requires traversing the entire list.

This illustrates the idea that algorithms almost always mirror the complexity of the data structures. This is even more true in database apps where complex database schemas create complex code and vice versa. I think most experienced programmers have uttered more curses regarding lousy data models than lousy code (although that also has to do with the degree of refactoring difficulty between data and code as well).

From Wikipedia:

Double-linked lists require more space per node (unless one uses xor-linking), and their elementary operations are more expensive; but they are often easier to manipulate because they allow sequential access to the list in both directions. In particular, one can insert or delete a node in a constant number of operations given only that node's address. To do the same in a singly-linked list, one must have the previous node's address. Some algorithms require access in both directions. On the other hand, they do not allow tail-sharing, and cannot be used as persistent data structures.

Rob
+2  A: 

This probably isn't what you mean but here's an idea.

If the data structures are quality tools they will make the rest of the code easier and cheaper to write. By having high quality data structures you reduce the effort required for the rest of the code.

The example that comes to mind is map-reduce. By hiring a small team of super-genius who understand massive parallelism your entire code base benefits from the power of not iterating. The rest of your not-quite-super-genius programmers write better code just by virtue of using the smart data structures.

http://www.joelonsoftware.com/items/2006/08/01.html

Daniel
A: 

@Rob's on the right track here for a good example. Take a look at the collections objects found in C# or Java.

I think I can expand a bit to make this clearer as to why this is a good example...there's many ways to do lists, but there's a few basic operations that are in common. These operations all vary with the specific implementation, but the caller shouldn't care. All the caller should care about is the list has certain functions and knows how to do it.

For instance, let's say we had a robot that took our groceries and put them all away for us.

while there are items in the bag's list
     remove the next item from the bag
     put away the item

put in something more code-ish

while (item = bag.NextItem())
{
   bag.Remove(item);
   robot.PutAway(item);
}

Now, do you care how the list of items in the bag is implemented? No, not at all. You just care that you can find the next item and take it out of the bag. Whether the bag is a double linked list, a single linked list, whatever, it doesn't matter.

So, the developer writing that hideous PutAway algorithm doesn't care how they find items, they just care that they have the item. The bag doesn't care what happens with the items, it just cares that it can hold them for a while, then they get taken away. And that developer gets to go home for the weekend. :)

Of course, this is a painfully simple example, there's a whole lot of holes you can poke in it.

Jim Leonardo
+2  A: 

Here's one for you from MaNGOS:

#if defined( __GNUC__ )
#pragma pack(1)
#else
#pragma pack(push,1)
#endif

typedef struct AUTH_LOGON_CHALLENGE_C
{
    uint8   cmd;
    uint8   error;
    uint16  size;
    uint8   gamename[4];
    uint8   version1;
    uint8   version2;
    uint8   version3;
    uint16  build;
    uint8   platform[4];
    uint8   os[4];
    uint8   country[4];
    uint32  timezone_bias;
    uint32  ip;
    uint8   I_len;
    uint8   I[1];
} sAuthLogonChallenge_C;

This is a TCP packet. Just looking at it tells you all the algorithms you'll need to send and receive well-formed packets. You'll still need to know the valid values for each field of course, but those are defined similarly in the same file.

Now imagine if the same logic was expressed by the decoder function and you had to write a client for it by looking at only that code.

Jurily
A: 

The sentence you quote comes immediately after a good example:

The first serious change I made was to add IMAP support. I did this by reorganizing the protocol machines into a generic driver and three method tables (for POP2, POP3, and IMAP).

Ken