views:

211

answers:

5

i am using C++

say i want to store 40 records( usrnames) i will simply use array

if i want to store 40000 records ( usrnames) . i will search it sequentially

which data structure should i use?

any thoughts?

+7  A: 

You need to specify what the insertion and removal requirements are. Do things need to be removed and inserted at random points in the sequence?

Also, why the requirement to search sequentially? Are you doing searches that aren't suitable for a hash table lookup?

At the moment I'd suggest a deque or a list. Often it's best to choose a container with the interface that makes for the simplest implementation for your algorithm and then only change the choice if the performance is inadequate and an alternative provides the necessary speedup.

A vector has two principle advantages, there is no per-object memory overhead, although vectors will over-allocate to prevent frequent copying and objects are stored contiguously so sequential access tends to be fast. These are also its disadvantages. Growing vectors require reallocation and copying, and insertion and removal from anywhere other than the end of the vector also require copying. Contiguous storage can produce problems for vectors with large numbers of objects or large objects as the contiguous storage requirements can be hard to satisfy even with only mild memory fragmentation.

A list doesn't require contigous storage but list nodes usually have a per-object overhead of two pointers (in most implementation). This can be significant in list of very small objects (e.g. in a list of pointers, each node is 3x the size of the data item). Insertion and removal from the middle of a list is very cheap though and list nodes never need to me moved in memory once created.

A deque uses chunked storage, so it has a low per-object overhead similar to a vector, but doesn't require contiguous storage over the whole container so doesn't have the same problem with fragmented memory spaces. It is often a very good choice for collections and is often overlooked.

Charles Bailey
A: 

std::vector and std::list seem good for this task. You can use an array if you know the maximum number of records beforehands.

Dmitry Risenberg
A: 

If you need only sequentially search and storage, then list is the proper container.

Also, vector wouldn't be a bad choice.

Nick D
std::list is a bad choice if only sequential access is required, since it has a greater overhead than vector or deque. list is only a good choice if insert/remove at the middle is required.
Steve Jessop
I have tested list on a hand held device. The list contained 70,000 strings. I had to `collect` and search the data progressively. The search was **very** fast. I also tested vector and map. Well, list won in my case. It's really a *solution* dependent issue.
Nick D
I meant memory overhead. Sure, if an extra allocation per item is affordable then it's not a concern, so maybe "bad choice" is over-stating the case. But I don't think it's right to call it the "proper container" either, since you're paying for something you aren't using.
Steve Jessop
I used the word `proper` only because the list data structure `invented` specifically to represent sequences in the memory. I agree with you that we pay something we don't use - c'est la vie :) Anyway, we should always balance the prons and cons of each container and select the right one for the job.
Nick D
Arrays (and vectors and deques) also specifically represent sequences in memory. In general you could even say anything with an iterator does, but there's an argument there about a "sequence" by definition needing the ability to arbitrarily set the order of elements. I think lists were invented as a particular way of storing a sequence so as to have the property of easy insertion/removal. External lists, that is: intrusive lists have additional useful properties with their own cost. Definitely I agree about weighing the pros and cons: that's what the question should be all about.
Steve Jessop
I don't like to *treat* an array as a sequence because arrays can be multi-dimensional, and a vector *feels* solid :) But yes, all these types represent sequences.
Nick D
+2  A: 

As a rule of thumb, prefer vector to list or, diety forbid, C-style array.

After the vector is filled, make sure it is properly ordered using the sort algorithm. You can then search for a particular record using either find, binary_search or lower_bound. (You don't need to sort to use find.)

avakar
+1  A: 

Seriously unless you are in a resource constrained environment (embedded platform, phone, or other). Use a std::map, save the effort of doing sorting or searching and let the container take care of everything. This will possibly be a sorted tree structure, probably balance (e.g. Red-Black), which means you will get good searching performance. Unless the size of you data is close to the size of one or two pointers, the memory overhead of whatever data structure you pick is negligable. You Graphics Card probably has more memory that you are going to use up for the data you are think about.

As others said there is very little good reason to use vanilla array, if you don't want to use a map use std::vector or std::list depending on whether you need insert/delete data (=>list) or not (=>vector)

Also consider if you really need all that data in memory, how about putting it on disk via sqlite. Or even use sqlite for in memory access. It all depends on what you need to do with your data.

Harald Scheirich