tags:

views:

796

answers:

7

hi all

I have a structure in C which resembles that of a database table record. Now when I query the table using select, I do not know how many records I will get. I want to store all the returned records from the select query in a array of my structure data type.

Which method is best?

Method 1: find array size and allocate

  1. first get the count of records by doing select count(*) from table
  2. allocate a static array
  3. run select * from table and then store each records in my structure in a loop.

Method 2: use single linked list

while ( records returned )
{
    create new node 
    store the record in node 
}

Which implementation is best?

My requirement is that when I have all the records, I will probably make copies of them or something. But I do not need random access and I will not be doing any search of a particular record.

Thanks

+1  A: 

A problem with 'select count(*)' is that the value might change between calls, so your "real" select will have a number of items different from the count you'd expect.

I think the best solution is your "2". Instead of a linked list, I would personally allocate an array (reallocating as necessary). This is easier in languages that support growing arrays (e.g. std::vector<myrecord> in C++ and List<myrecord> in C#).

Bjarke Ebert
yes i thought about the problem..u mentioned that count value can change...incase someone inserts a new record ..so yea..i too feel no 2 is better..thanks
linkedlist
Certainly you put the SELECT count(*) and the "real" SELECT inside a serializable transaction? So, there is no risk that the value changes between calls.
bortzmeyer
+1  A: 

You forgot option 3, it's a little more complicated but it might be best for your particular case. This is the way it's typically done in C++ std::vector.

Allocate an array of any comfortable size. When that array is filled, allocate a new larger array of 1.5x to 2x the size of the filled one, then copy the filled array to this one. Free the original array and replace it with the new one. Lather, rinse, repeat.

Mark Ransom
i m using C only so..
linkedlist
This algorithm works fine in C, I just mentioned C++ as an example.
Mark Ransom
Stick everything in an opaque data structure, too, and access that structure only through defined methods/macros.
skiphoppy
Qt's documentation provides a smart method for allocation+growth for arrays: http://doc.trolltech.com/4.4/containers.html#growth-strategies
strager
Also, because the OP is using C, he can use realloc() instead of allocating a new array and copying. This is what makes Qt's growth strategies efficient.
strager
+3  A: 

And I forgot option #4. Allocate an array of fixed size. When that array is full, allocate another. You can keep track of the arrays by linking them in a linked list, or having a higher level array that keeps the pointers to the data arrays. This two-level scheme is great when you need random access, you just need to break your index into two parts.

Mark Ransom
A: 

There are a good many possible critiques that should be made.

  1. You are not talking about a static array at all - a static array would be of pre-determined size fixed at compile time, and either local to a source file or local to a function. You are talking about a dynamically allocated array.

  2. You do not give any indication of record size or record count, nor of how dynamic the database underneath is (that is, could any other process change any of the data while yours is running). The sizing information isn't dreadfully critical, but the other factor is. If you're doing a report of some sort, then fetching the data into memory is fine; you aren't going to modify the database and the data is an accurate snapshot. However, if other people could be modifying the records while you are modifying records, your outline solution is a major example of how to lose other people's updates. That is a BAD thing!

  3. Why do you need all the data in memory at once? Ignoring size constraints, what exactly is the benefit of that compared with processing each relevant record once in the correct sequence? You see, DBMS put a lot of effort into being able to select the relevant records (WHERE clauses) and the relevant data (SELECT lists) and allow you to specify the sequence (ORDER BY clauses) and they have the best sort systems they can afford (better than the ones you or I are likely to produce).

  4. Beware of quadratic behaviour if you allocate your array in chunks. Each time you reallocate, there's a decent chance the old memory will have to be copied to the new location. This will fragment your memory (the old location will be available for reuse, but by definition will be too small to reuse). Mark Ransom points out a reasonable alternative - not the world's simplest scheme overall (but it avoids the quadratic behaviour I referred to). Of course, you can (and would) abstract that away by a set of suitable functions.

  5. Bulk fetching (also mentioned by Mark Ransom) is also useful. You would want to preallocate the array into which a bulk fetch fetches so that you don't have to do extra copying. This is just linear behaviour though, so it is less serious.

Jonathan Leffler
A: 

Create a data structure to represent your array or list. Pretend you're in an OO language and create accessors and constructors for everything you need. Inside that data structure, keep an array, and, as others have said, when the array is filled to capacity, allocate a new array 2x as large and copy into it. Access the structure only through your defined routines for accessing it.

This is the way Java and other languages do this. Internally, this is even how Perl is implemented in C.

I was going to say your best option is to look for a library that already does this ... maybe you can borrow Perl's C implementation of this kind of data structure. I'm sure it's more well tested than anything you or I could roll up from scratch. :)

skiphoppy
A: 
Max
If there is any other memory allocation activity involved (for example, in the database access routines), this can easily lead to quadratic behaviour as the data is copied around on reallocation - which could even be a copy every iteration.
Jonathan Leffler
Jonathan, you're absolutely right. This is proof-of-concept code and I wouldn't even be using the realloc() function in production at all.
Max
A: 

The linked list is a nice, simple option. I'd go with that. If you prefer the growing array, you can find an implementation as part of Dave Hanson's C Interfaces and Implementations, which as a bonus also provides linked lists.

This looks to me like a design decision that is likely to change as your application evolves, so you should definitely hide the representation behind a suitable API. If you don't already know how to do this, Hanson's code will give you a number of nice examples.

Norman Ramsey