views:

1321

answers:

4

I have an experiment streaming up 1Mb/s of numeric data which needs to be stored for later processing. It seems as easy to write directly into a database as to a CSV file and I would then have the ability to easily retrieve subsets or ranges.

I have experience of sqlite2 (when it only had text fields) and it seemed pretty much as fast as raw disk access. Any opinions on the best current in-process DBMS for this application?

Sorry - should have added this is C++ intially on windows but cross platform is nice. Ideally the DB binary file format shoudl be cross platform.

A: 

Depends what language you are using. If it's C/C++, TCL, or PHP, SQLite is still among the best in the single-writer scenario. If you don't need SQL access, a berkeley DB-style library might be slightly faster, like Sleepycat or gdbm. With multiple writers you could consider a separate client/server solution but it doesn't sound like you need it. If you're using Java, hdqldb or derby (shipped with Sun's JVM under the "JavaDB" branding) seem to be the solutions of choice.

sk
+1  A: 

If all you want to do is store the numbers and be able to easily to range queries, you can just take any standard tree data structure you have available in STL and serialize it to disk. This may bite you in a cross-platform environment, especially if you are trying to go cross-architecture.

As far as more flexible/people-friendly solutions, sqlite3 is widely used, solid, stable,very nice all around.

BerkeleyDB has a number of good features for which one would use it, but none of them apply in this scenario, imho.

I'd say go with sqlite3 if you can accept the license agreement.

-D

SquareCog
Thats what I do at the moment, STL->xml but it seems a lot of overhead for very little gain. The DB gives me the chance to do something cleverer in the future without having to write more code.
Martin Beckett
+1  A: 

If you only need to read/write the data, without any checking or manipulation done in database, then both should do it fine. Firebird's database file can be copied, as long as the system has the same endianess (i.e. you cannot copy the file between systems with Intel and PPC processors, but Intel-Intel is fine).

However, if you need to ever do anything with data, which is beyond simple read/write, then go with Firebird, as it is a full SQL server with all the 'enterprise' features like triggers, views, stored procedures, temporary tables, etc.

BTW, if you decide to give Firebird a try, I highly recommend you use IBPP library to access it. It is a very thin C++ wrapper around Firebird's C API. I has about 10 classes that encapsulate everything and it's dead-easy to use.

Milan Babuškov
A: 

Hi

I suspect that neither database will allow you to write data at such high speed. You can check this yourself to be sure. In my experience - SQLite failed to INSERT more then 1000 rows per second for a very simple table with a single integer primary key.

In case of a performance problem - I would use CSV format to write the files, and later I would load their data to the database (SQLite or Firebird) for further processing.

Good Luck

Liron Levi

Creator of the SQLite Compare diff/merge utility

Liron Levi
Thanks, the high speed application never materialised so I never got a chance to push it.
Martin Beckett
Are you sure you used SQLite with appropriate settings? See http://www.sqlite.org/faq.html#q19
Weidenrinde
Well - I didn't turn off the sync flag. Maybe with the flag turned off the performance is better (however - you may risk corrupting the database in case of power failure).
Liron Levi