I have a linked list of around 5000 entries ("NOT" inserted simultaneously), and I am traversing the list, looking for a particular entry on occasions (though this is not very often), should I consider Hash Table as a more optimum choice for this case, replacing the linked list (which is doubly-linked & linear) ?? Using C in Linux.
Whether using hash table is more optimum or not depends on the use case, which you have not described in detail. But more importantly, make sure the bottleneck of performance is in this part of the code. If this code is called only once in a while and not in a critical path, no use bothering to change the code.
If you care about performance, you definitely should. If you're iterating through the thing to find a certain element with any regularity, it's going to be worth it to use a hash table. If it's a rare case, though, and the ordinary use of the list is not a search, then there's no reason to worry about it.
Have you measured and found a performance hit with the lookup? A hash_map
or hash table
should be good.
If you only traverse the collection I don't see any advantages of using a hashmap.
If you need to traverse the list in order (not as a part of searching for elements, but say for displaying them) then a linked list is a good choice. If you're only storing them so that you can look up elements then a hash table will greatly outperform a linked list (for all but the worst possible hash function).
If your application calls for both types of operations, you might consider keeping both, and using whichever one is appropriate for a particular task. The memory overhead would be small, since you'd only need to keep one copy of each element in memory and have the data structures store pointers to these objects.
As with any optimization step that you take, make sure you measure your code to find the real bottleneck before you make any changes.
If you have not found the code to be the slow part of the application via a profiler then you shouldn't do anything about it yet.
If it is slow, but the code is tested, works, and is clear, and there are other slower areas that you can work on speeding up do those first.
If it is buggy then you need to fix it anyways, go for the hash table as it will be faster than the list. This assumes that the order that the data is traversed does not matter, if you care about what the insertion order is then stick with the list (you can do things with a hash table and keep the order, but that will make the code much tricker).
Given that you need to search the list only on occasion the odds of this being a significant bottleneck in your code is small.
Another data structure to look at is a "skip list" which basically lets you skip over a large portion of the list. This requires that the list be sorted however, which, depending on what you are doing, may make the code slower overall.
I advise against hashes in almost all cases.
There are two reasons; firstly, the size of the hash is fixed.
Second and much more importantly; the hashing algorithm. How do you know you've got it right? how will it behave with real data rather than test data?
I suggest a balanced b-tree. Always O(log n), no uncertainty with regard to a hash algorithm and no size limits.