tags:

views:

770

answers:

11

Instead of the STL and similar libraries in other languages?

As a newbie, how much should I delve into this part of software development? Breadth first or depth?

Is only a conceptual understanding necessary these days? Or should I be able to implement a doubly linked list blindfolded?

+2  A: 

All depends on what you need / want to do.

If you have a gun to your head and need to get something out the door, probably understanding which structure / algorithm will get you to your end fastest is probably enough.

If you're finding a bottleneck in your code and can trace it to an algorithm / structure, then you may need to go deeper.

If you're a student, then sure, learn as much about them as you want!

John at CashCommons
I agree completely... Well almost. I'd change "learn as much as you want..." to "learn as much as you **can**"
Jason D
+16  A: 

While no one really rolls their own stacks or queues anymore, it -is- very important to understand how and why they are different. So no, to use simple data structures effectively, it's not 100% necessary to be able to do all the proper error checking for loops/null tail/concurrency/etc in a linked list while blindfolded.

However, while the simplest data structures aren't rewritten over and over, trees and graphs are often still custom rolled, and you probably won't be able to do anything with those without understanding more basic data structures.

Also, these often fall under 'interview questions', so they're worth knowing how to do even if you don't actually rewrite a doubly linked list in live code.

Tanzelax
I had to roll my own queue recently. It's much more common in industries that have aggressive real-time performance or predictability requirements.
thebretness
I had to write a custom queue just this morning.
Crashworks
For example, EA wasn't happy with the way STL does allocation, so it rolled its own: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html.
thebretness
Ah, I stand corrected. Though, he did mention 'as a newbie', which kind of precludes aggressive real-time performance requirements... still, thanks for the link, was an interesting read. :)
Tanzelax
+4  A: 

Just because a lot of languages/SDK's etc. provide this stuff for you already doesn't mean it's not important to still understand how they work and the best way to understand the algorithms is to write them yourself.

Especially if you find yourself working on time critical code then the cost of everything you use is important and unless you know the difference in implementation between various data structures you might find yourself using the less efficient options.

And to answer the question in the subject line, yes - a lot of people still write their own implementations when speed/space/platform is restricted and they need to know exactly what's going on inside their functions. I know that in the video game industry we often write our own fast and memory efficient container classes that are optimized for each target platform.

Fraser Graham
+4  A: 

I've been known to code a Merge Sort when working with data too big to fit in memory, just for one example. So yes, it's very helpful to know the limits of the standard tools and when you might be able to do better with something specialized.

Mark Ransom
I would think that if you are having memory limitations that this would be one of the least-chosen sorting options. Even if you aren't making copies of the indexed elements and you aren't making recursive calls, you still need to maintain a stack to keep track of where you are and what you're merging. Please enlighten me :)
San Jacinto
Merge sort can be implemented with linear external storage, in my case a file on hard disk. In each merge pass you double the size of the sorted subsections, and all you need to track is the size of the subsection.
Mark Ransom
+3  A: 

Personally I feel these things fall under the Law of Leaky Abstractions.

Sure you could spend your days writing C# code with nice string type, but at some point you're going to have to understand the difference between "\r\n" and "\n" and why svn seems to import fine from windows but screws up on your linux machine, and thats when it helps to have implemented string functions.

As someone who has been coding for the last decade: no, I don't rewrite doubly linked lists, but thats because I've written them the first hundred times over. So as a new programmer, do it a couple times, then grab a nice API. Preferably a well document one...

tzenes
+1  A: 

From a professional development point of view, you should be able to implement whatever you might need; some day you will probably have to implement something that does not exist in a library, and you will also probably have to invent new algorithms from time to time.

However, to become productive quicker, learn how to evaluate a library or template data structure first; what is its time cost, memory cost, maintainability, etc. That will have you writing better code sooner. But do not stop there, learn how to implement them as well. Study the implementations of open-source libraries so you know how they work, not just what they do.

Andrew McGregor
+3  A: 

I say everyone should know how to implement the basic data structures, such as doubly linked-lists. Without such an understanding, how can you say you understand pointers and other relevant things. I believe that to a decent programmer, most basic data structures should be trivial to implement naively, and take not more than a day to implement decently.

Earlz
+2  A: 

Many times in Embedded Systems, data structures are rewritten. The STL can contain algorithms and data structures that are overkill for smaller platforms. The STL algorithms and data structures are generalized. The generalizations take up code and memory space that could be used for other functionality.

There are other data structures that are commonly rewritten that are not part of the STL. One example is a ring buffer or circular queue. Some shops perfer to rewrite the code than mess around with licensing fees or copyright laws when using off the shelf libraries.

Thomas Matthews
+2  A: 

…Instead of the STL and similar libraries in other languages?

Sometimes you want something that isn't in the library. I use circularly singly linked lists a lot. They aren't in the STL, they don't support STL sequences, and the implementation is so simple that rolling my own is simpler than downloading.

As a newbie, how much should I delve into this part of software development? Breadth first or depth?

Don't spend too much time. If you don't need it immediately, it's theoretical knowledge, and theory is useless without depth. Work through a good data structures book and skip whatever you find impossibly dull. If you know you'll be taking a data structures course later, pick up its book ahead of time.

(Although I tried just that and ended up with a useless book. Then I went to another school's bookstore, found a better book, and obtained proficiency credit without taking my school's course!)

Is only a conceptual understanding necessary these days? Or should I be able to implement a doubly linked list blindfolded?

Take the middle ground. You need to know the properties of structures to be able to find bugs resulting from use of the wrong structure. But don't bore yourself implementing red-black trees, and certainly don't make a habit of coding structures you could get pre-made.

Potatoswatter
Adding STL iterators to a circular list is pretty trivial. begin() == end() and you're done.
Zan Lynx
I'm working through Teach yourself Data Structures and Algorithms in 24 Hours by Robert Lafore. Despite its non-academic credentials, it comes across as a sensible introduction to the subject, instead of being immersed in mathematical rigour and "over" analysis.
devilcrack
@Zan: You might be done sooner than you expect, in a `for` loop ;v)
Potatoswatter
@Potatoswatter: Good point. I guess you'd need a bool to indicate an increment has been done. Then begin() == [listptr, false] and end() == [listptr, true].
Zan Lynx
@Zan: Yeah, I thought of that, and it sounds like it should work practically, but it seems like too much of a gamble and too much potential loss in efficiency. It would be more like `iterator::begin_seq() { m_has_incremented = false; }` and you have to remember to call `begin_seq` because initialization and assignment can't clear the bit—that would violate the iterator requirements.
Potatoswatter
@Potatoswatter: I don't think you need a special function. circlist.begin() == circlist::iterator(listptr, false) and circlist::iterator::operator++ = { curr=curr->next; incremented = true; } Don't worry about efficiency, the dominating factor in linked lists is cache misses.
Zan Lynx
@Zan: the other problem is that containers don't fit into the circlist paradigm. A container object generally stores (at least) a `begin` ptr. One advantage of circlists is that any pointer into a list also points to the whole list, doing the job of a container. Another advantage is that lists can be split and joined by `swap`'ing links, interfering with container semantics. So, if I had a container, its `begin` method would solve the problem, but instead I would need either add a method to the iterator or equivalently define an "imaginary" container initialized with a single iterator.
Potatoswatter
+1  A: 

You should be able to write your own data structures. Actually doing it for a job should be an unusual circumstance. The C++ STL or Java collections or .NET provided data structures should be good for 99% of circumstances.

I wrote some custom data structures a year ago for a work project because we were able to take advantage of our data's unique properties to use an on-disk memory map, compress it, store most of it in the indexes, use bits of the offset pointers to indicate the type of the next data object and make it obfuscated enough to deter most people from trying to read out our database.

That opportunity only happened to me once in ten years though.

Zan Lynx
+1  A: 

you must have heard about the 80/20 rule. 80% of the developers dont have hands on (real-world) experinece on implementing data structures in their language of choice. So to be one among that 20%, try to learn it and gain that practical experience. That said, in reality 80% of the work that you would do wouldnt require implementing your own DS. Incase of language like Java and C#, sometimes writing your own DS would invite criticism. You have packages/librabries for most of your needs. But while implementing these DS in your langauage, you will spruce up your programming skills. So go ahead and start with your custom stack and queue. I m sure the first that you would learn (incase you are using java/c# as your language of choice) is memory leaks :)

Cshah
I rolled out my own custom bubble sort(lol) this morning. Now I'm going to change it into an insert sort. Perhaps tonight I'll have a look into stacks and queues(although I already possess a good understanding of the concepts of these).
devilcrack