Do people still write their own data structures and algorithms?

views:

770

answers:

+19 Q:

Do people still write their own data structures and algorithms?

Instead of the STL and similar libraries in other languages?

As a newbie, how much should I delve into this part of software development? Breadth first or depth?

Is only a conceptual understanding necessary these days? Or should I be able to implement a doubly linked list blindfolded?

+2 A:

All depends on what you need / want to do.

If you have a gun to your head and need to get something out the door, probably understanding which structure / algorithm will get you to your end fastest is probably enough.

If you're finding a bottleneck in your code and can trace it to an algorithm / structure, then you may need to go deeper.

If you're a student, then sure, learn as much about them as you want!

John at CashCommons 2010-02-22 23:37:22

I agree completely... Well almost. I'd change "learn as much as you want..." to "learn as much as you **can**"

Jason D 2010-02-23 03:10:56

+16 A:

While no one really rolls their own stacks or queues anymore, it -is- very important to understand how and why they are different. So no, to use simple data structures effectively, it's not 100% necessary to be able to do all the proper error checking for loops/null tail/concurrency/etc in a linked list while blindfolded.

However, while the simplest data structures aren't rewritten over and over, trees and graphs are often still custom rolled, and you probably won't be able to do anything with those without understanding more basic data structures.

Also, these often fall under 'interview questions', so they're worth knowing how to do even if you don't actually rewrite a doubly linked list in live code.

Tanzelax 2010-02-22 23:41:26

I had to roll my own queue recently. It's much more common in industries that have aggressive real-time performance or predictability requirements.

thebretness 2010-02-23 00:09:44

I had to write a custom queue just this morning.

Crashworks 2010-02-23 00:09:45

For example, EA wasn't happy with the way STL does allocation, so it rolled its own: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html.

thebretness 2010-02-23 00:11:21

Ah, I stand corrected. Though, he did mention 'as a newbie', which kind of precludes aggressive real-time performance requirements... still, thanks for the link, was an interesting read. :)

Tanzelax 2010-02-23 00:17:04

+4 A:

Just because a lot of languages/SDK's etc. provide this stuff for you already doesn't mean it's not important to still understand how they work and the best way to understand the algorithms is to write them yourself.

Especially if you find yourself working on time critical code then the cost of everything you use is important and unless you know the difference in implementation between various data structures you might find yourself using the less efficient options.

And to answer the question in the subject line, yes - a lot of people still write their own implementations when speed/space/platform is restricted and they need to know exactly what's going on inside their functions. I know that in the video game industry we often write our own fast and memory efficient container classes that are optimized for each target platform.

Fraser Graham 2010-02-22 23:42:25

+4 A:

I've been known to code a Merge Sort when working with data too big to fit in memory, just for one example. So yes, it's very helpful to know the limits of the standard tools and when you might be able to do better with something specialized.

Mark Ransom 2010-02-22 23:46:23

I would think that if you are having memory limitations that this would be one of the least-chosen sorting options. Even if you aren't making copies of the indexed elements and you aren't making recursive calls, you still need to maintain a stack to keep track of where you are and what you're merging. Please enlighten me :)

San Jacinto 2010-02-23 00:07:37

Merge sort can be implemented with linear external storage, in my case a file on hard disk. In each merge pass you double the size of the sorted subsections, and all you need to track is the size of the subsection.

Mark Ransom 2010-02-23 00:50:20

+3 A:

Personally I feel these things fall under the Law of Leaky Abstractions.

Sure you could spend your days writing C# code with nice string type, but at some point you're going to have to understand the difference between "\r\n" and "\n" and why svn seems to import fine from windows but screws up on your linux machine, and thats when it helps to have implemented string functions.

As someone who has been coding for the last decade: no, I don't rewrite doubly linked lists, but thats because I've written them the first hundred times over. So as a new programmer, do it a couple times, then grab a nice API. Preferably a well document one...

tzenes 2010-02-22 23:47:27

+1 A:

From a professional development point of view, you should be able to implement whatever you might need; some day you will probably have to implement something that does not exist in a library, and you will also probably have to invent new algorithms from time to time.

However, to become productive quicker, learn how to evaluate a library or template data structure first; what is its time cost, memory cost, maintainability, etc. That will have you writing better code sooner. But do not stop there, learn how to implement them as well. Study the implementations of open-source libraries so you know how they work, not just what they do.

Andrew McGregor 2010-02-23 00:07:25

+3 A:

I say everyone should know how to implement the basic data structures, such as doubly linked-lists. Without such an understanding, how can you say you understand pointers and other relevant things. I believe that to a decent programmer, most basic data structures should be trivial to implement naively, and take not more than a day to implement decently.

Earlz 2010-02-23 00:29:35

+2 A:

Many times in Embedded Systems, data structures are rewritten. The STL can contain algorithms and data structures that are overkill for smaller platforms. The STL algorithms and data structures are generalized. The generalizations take up code and memory space that could be used for other functionality.

There are other data structures that are commonly rewritten that are not part of the STL. One example is a ring buffer or circular queue. Some shops perfer to rewrite the code than mess around with licensing fees or copyright laws when using off the shelf libraries.

Thomas Matthews 2010-02-23 00:37:07

+2 A:

…Instead of the STL and similar libraries in other languages?

Sometimes you want something that isn't in the library. I use circularly singly linked lists a lot. They aren't in the STL, they don't support STL sequences, and the implementation is so simple that rolling my own is simpler than downloading.

As a newbie, how much should I delve into this part of software development? Breadth first or depth?

Don't spend too much time. If you don't need it immediately, it's theoretical knowledge, and theory is useless without depth. Work through a good data structures book and skip whatever you find impossibly dull. If you know you'll be taking a data structures course later, pick up its book ahead of time.

(Although I tried just that and ended up with a useless book. Then I went to another school's bookstore, found a better book, and obtained proficiency credit without taking my school's course!)

Is only a conceptual understanding necessary these days? Or should I be able to implement a doubly linked list blindfolded?

Take the middle ground. You need to know the properties of structures to be able to find bugs resulting from use of the wrong structure. But don't bore yourself implementing red-black trees, and certainly don't make a habit of coding structures you could get pre-made.

Potatoswatter 2010-02-23 01:17:40

Adding STL iterators to a circular list is pretty trivial. begin() == end() and you're done.

Zan Lynx 2010-02-23 01:22:43

I'm working through Teach yourself Data Structures and Algorithms in 24 Hours by Robert Lafore. Despite its non-academic credentials, it comes across as a sensible introduction to the subject, instead of being immersed in mathematical rigour and "over" analysis.

devilcrack 2010-02-23 01:53:32

@Zan: You might be done sooner than you expect, in a `for` loop ;v)

Potatoswatter 2010-02-23 03:05:41

@Potatoswatter: Good point. I guess you'd need a bool to indicate an increment has been done. Then begin() == [listptr, false] and end() == [listptr, true].

Zan Lynx 2010-02-23 03:20:27

@Zan: Yeah, I thought of that, and it sounds like it should work practically, but it seems like too much of a gamble and too much potential loss in efficiency. It would be more like `iterator::begin_seq() { m_has_incremented = false; }` and you have to remember to call `begin_seq` because initialization and assignment can't clear the bit—that would violate the iterator requirements.

Potatoswatter 2010-02-23 04:32:16

@Potatoswatter: I don't think you need a special function. circlist.begin() == circlist::iterator(listptr, false) and circlist::iterator::operator++ = { curr=curr->next; incremented = true; } Don't worry about efficiency, the dominating factor in linked lists is cache misses.

Zan Lynx 2010-02-24 23:15:20

@Zan: the other problem is that containers don't fit into the circlist paradigm. A container object generally stores (at least) a `begin` ptr. One advantage of circlists is that any pointer into a list also points to the whole list, doing the job of a container. Another advantage is that lists can be split and joined by `swap`'ing links, interfering with container semantics. So, if I had a container, its `begin` method would solve the problem, but instead I would need either add a method to the iterator or equivalently define an "imaginary" container initialized with a single iterator.

Potatoswatter 2010-02-26 03:25:24

+1 A:

You should be able to write your own data structures. Actually doing it for a job should be an unusual circumstance. The C++ STL or Java collections or .NET provided data structures should be good for 99% of circumstances.

I wrote some custom data structures a year ago for a work project because we were able to take advantage of our data's unique properties to use an on-disk memory map, compress it, store most of it in the indexes, use bits of the offset pointers to indicate the type of the next data object and make it obfuscated enough to deter most people from trying to read out our database.

That opportunity only happened to me once in ten years though.

Zan Lynx 2010-02-23 01:29:16

+1 A:

you must have heard about the 80/20 rule. 80% of the developers dont have hands on (real-world) experinece on implementing data structures in their language of choice. So to be one among that 20%, try to learn it and gain that practical experience. That said, in reality 80% of the work that you would do wouldnt require implementing your own DS. Incase of language like Java and C#, sometimes writing your own DS would invite criticism. You have packages/librabries for most of your needs. But while implementing these DS in your langauage, you will spruce up your programming skills. So go ahead and start with your custom stack and queue. I m sure the first that you would learn (incase you are using java/c# as your language of choice) is memory leaks :)

Cshah 2010-02-23 15:00:36

I rolled out my own custom bubble sort(lol) this morning. Now I'm going to change it into an insert sort. Perhaps tonight I'll have a look into stacks and queues(although I already possess a good understanding of the concepts of these).

devilcrack 2010-02-23 15:59:44

ansaurus

tags:

views:

answers:

Do people still write their own data structures and algorithms?

related questions