views:

148

answers:

2

Hi, We have a database with a very simple schema:

 CREATE TABLE IF NOT EXISTS tblIndex(
     frame_type  INT, 
     pts  VARCHAR(5),
     ts_start  INT primary key,
     ts_end  INT)

And the application scenario is :

  1. Every second user will insert 2 ~ 50 records ,and the ts_start fields of those records is always increasing. After 8 hours ,there are at most 1_800_000 records. By setting the sync mode to off, the performance seems OK so far.And because the data of each record has only 16 bytes, we may use some buffer even if the insert speed is not too fast.

  2. After 8 hours , user will tell me to delete the oldest data by telling the upper bound ts_start , so I will do

    DELETE FROM tblIndex WHERE ts_start < upper_bound_ts_start.

    Delete 90_000 (which is the records for half a hour) out of the 1_800_000 now take 17 seconds. A litter bit longer than expected. Any way to reduce this ? We don't care if the records is synced to the hard disk immediately. We are thinking start a separated thread to do the delete so to make this call to be async. The things I am not sure is whether the (long time )delete will impact the insert performance if they share the same connection? Or should I use a separated connection for insert and delete? But In this way, do they need be synced in application level?

  3. Search. SELECT ts_start FROM tblIndex WHERE ts_start BETWEEN ? AND ? - As ts_start is the primary key so the performance is OK for our need now. I am thinking i should use a separated connection for search , right?

Configuration of SQLite :

hard disk database (usb interface)
cache size is 2000
page size is 1024
sync mode is 0 (off)
journal_mode mode is truncate

Thanks for any suggestion to improve the delete performance or about the overall design.

EDIT: 350M MIPS CPU , not too much memory (<2M) specific for this application.

+1  A: 

Since the data is transient -- and small -- why are you using a database?

You'd be much happier with a simple directory of flat files.

Your web query can simply read the relevant set of files and return the required results. 1_800_000 records of 16 bytes is just 28Mb of file data. You can read the whole thing into memory, do your processing in memory, and present the results.

A separate process can delete files that are old once a day at midnight.

A third process can append 2-50 16-byte records to the working file each second.

  • Write and flush so that the file is correct and complete after each I/O. If your reader handles an incomplete last record gracefully, you don't even need a lock.

  • Name each file with a sequence number based on the time. You could, for example, take the system time (in seconds) divide by 4*60*60 and truncate the answer. That's a sequence number that will advance once every 4 hours, creating a new file. 8 hours of data is 3 of these files (2 previous 4-hour files, plus the current working file.)

S.Lott
Thanks, Lott. It is a good suggestion to "Name each file with a sequence number based on the time", we will think about this alternative.The reason we choose the database is we want to handle all the search , sync and power-fail-safe stuff to the database. And we are on embedded system so we don't have enough memory to load all these index file (into memory).
pierr
You only have 4 columns, three of which are ints. You can write your own search that will be quicker than SQL. Syncing a file is simpler, faster and more reliable than a database. The powerfail-fail-safe is already part of the OS filesystem. RDBMS is ONLY appropriate for complex data models with (a) flexibility, (b) joins, and (c) permanence. You have none of these attributes.
S.Lott
I have almost the same needs and a flat file is definitely not the right solution, except if you can code me search and group by queries as efficient as a SGBD...
Mose
Search and Group By queries using simple Map-Reduce algorithms will often be much, much faster the relational database (RDBMS in English, SGBD in French). Remember, group-by is a reduce operation that can be implemented using a simple tree-map which will outperform anything involving a traditional RDBMS sort.
S.Lott
A: 

How about have one sqlite database for each half-hour? In other words, start a new database file each half hour and just delete the old one.

Dave
And how do you perform a search or a group-by on all the tables ?I don't think 'Union' will give good performances...
Mose
One comment above mentions using Map-Reduce. If the goal is for reporting, then I'd periodically move the individual databases to a data warehouse of some sort. If you need this in real-time then maybe sqllite is not the right choice of database.
Dave