views:

212

answers:

9

We're deploy an Instant messenger (AJAX), for it we write a comet server who will handle the communication now we want to store the sent messages to a DB, which DB engine one give the best performance? We talk about 5000 Insert/sec, i think MySQL and PostgreSQL was out of the game, any proposals? HamsterDB, SQLite, MongoDB ...?

We don't query the data after, it's only stored for legal reasons. Means the keypoint is the performance of Inerts/sec.

+5  A: 

If you don't need to do queries, then database is not what you need. Use a log file.

jmz
I found we can handle the data easier with a DB System, we don't query the data for our web app but if there is some investigation from the law we need to be able to deliver the requested data, means it will use less time to collect it.
Nenad
I'd go for a text-file based solution as well. To search them you can use commandline tools like grep or simple text processing. The time you spent into scaling a DBMS for this job will be much more than writing some small scripts to analyze the logfiles, especially if you have a decently structured logfile. If it's for legal purposes: a text file on a CD/DVD will still be readable in 10 years (provided the disk itself isn't damaged) as well, are you sure your database dumps will be?
a_horse_with_no_name
Understand the tradeoff. The last query might happen once, or not at all. How much time do you want to spend optimizing for it, considering you might not even know the exact request? It's often feasible and legally reasonable to have all necessary data, and manually query it when a police request arrives.
MSalters
@a_horse_with_no_name - I understand youre point of view, but I'm ready to maintain a DBMS for the benefit that we can easy collect data from it if needed.@MSalters - That's correct, but to query a DB depending on the request is easier for me then to grep some log files.
Nenad
+9  A: 

If you are never going to query the data, then i wouldn't store it to a database at all, you will never beat the performance of just writing them to a flat file.

What you might want to consider is the scaling issues, what happens when it's to slow to write the data to a flat file, will you invest in faster disk's, or something else.

Another thing to consider is how to scale the service so that you can add more servers without having to coordinate the logs of each server and consolidate them manually.

edit: You wrote that you want to have it in a database, and then i would also consider security issues with havening the data on line, what happens when your service gets compromised, do you want your attackers to be able to alter the history of what have been said?

It might be smarter to store it temporary to a file, and then dump it to an off-site place that's not accessible if your Internet fronts gets hacked.

Alexander Kjäll
This is a reason more for a DB System, most of them will help to be able to scale them without troubles. At the moment my favorite is MongoDB but I'm wondering if another DB System can provide more Insert/sec
Nenad
really, log files with log rotation is a solved art.reliable database scaling is only solved at high price end of the market, and even then my personal experience with it suggests its usually misconfigured and not working properly.Flat files will be massively faster, always.
Will
A: 

If money plays no role, you can use TimesTen. http://www.oracle.com/timesten/index.html

A complete in memory database, with amazing speed.

BitKFu
I forget to mention we're on a low budget :-)
Nenad
Eh, if you want an in-memory solution then save your $$. Use something like mysql but specify that the tables use the MEMORY storage engine, and then set up a slave server to replicate the memory tables to an un-indexed myisam table. problem solved, and $$ saved.
Timothy
Last time i was try to do something smiliar i get trouble with record limitation on the Memory table, but the biggest problem was the performance lack with lock/unlock of this table when is used with multiple threads.
Nenad
A: 

I would use the log file for this, but if you must use a database, I highly recommend Firebird. I just tested the speed, it inserts about 10k records per second on quite average hardware (3 years old desktop computer). The table has one compound index, so I guess it would work even faster without it:

milanb@kiklop:~$ fbexport -i -d test -f test.fbx -v table1 -p **
Connecting to: 'LOCALHOST'...Connected.
Creating and starting transaction...Done.
Create statement...Done.
Doing verbatim import of table: TABLE1
Importing data...
SQL: INSERT INTO TABLE1 (AKCIJA,DATUM,KORISNIK,PK,TABELA)  VALUES (?,?,?,?,?)
Prepare statement...Done.
Checkpoint at: 1000 lines.
Checkpoint at: 2000 lines.
Checkpoint at: 3000 lines.
...etc.
Checkpoint at: 20000 lines.
Checkpoint at: 21000 lines.
Checkpoint at: 22000 lines.

Start   : Thu Aug 19 10:43:12 2010
End     : Thu Aug 19 10:43:14 2010
Elapsed : 2 seconds.
22264 rows imported from test.fbx.

Firebird is open source, and completely free even for commercial projects.

Milan Babuškov
I'm not really up-to-date with RDBMS Systems, but last time around 4 years before when i touch Firebird it was the slowest RDBMS available for Inserts. If I'm not wrong MongoDB is around 5 times faster for Inserts then firebird.
Nenad
Firebird is a nice DBMS, but _if_ you go for a DBMS, I'd choose PostgreSQL over Firebird any time. The PostgreSQL's community is more active than Firebird's and has plannable release cycles. The biggest drawback of Firebird is the unstructured manual. If you need to find a specific feature/function you need to first go through the Interbase manuals, and then through each(!) of the release notes since then. There is no complete and consolidated manual for the current release, which is very annoying
a_horse_with_no_name
I'd take comments from "a horse with no name" with a grain of salt. But everyone has right to their own opinion, of course.
Milan Babuškov
+4  A: 

it's only stored for legal reasons.

And what about the detailed requirements? You mention the NoSQL solutions, but these can't promise the data is realy stored on disk. In PostgreSQL everything is transaction safe, so you're 100% sure the data is on disk and is available. (just don't turn of fsync)

Speed has a lot to do with your hardware, your configuration and your application. PostgreSQL can insert thousands of record per second on good hardware and using a correct configuration, it can be painfully slow using the same hardware but using a plain stupid configuration and/or the wrong approach in your application. A single INSERT is slow, many INSERT's in a single transaction are much faster, prepared statements even faster and COPY does magic when you need speed. It's up to you.

Frank Heikens
100% sure on disk might not be necessary for legal reasons. If you can prove you had a disk crash, and specifically because of that can't comply with a particular legal request, that crash can be considered an Act of God.
MSalters
Who knows. But an act of God? Would be a nice statement in court, but a good chance you loose. Just check the requirements and than find a solution.
Frank Heikens
@Frank Heikens - The data is from a IM of a dating site, there is no need to store it transaction safe. Sure I hope we don't will loss any data. As our budget is limited, we have for this comet server on one deidacted box who will handle the IM conversations and on the same we will store the data. I know the benefits of PostgreSQL but in this actual scenario i think it can not match the performance of MongoDB untill we spend many bucks for a 48 core server, ssd array and much ram.
Nenad
@MSalters - 100% right
Nenad
@Frank Heikens: Unless you're working in a regulated industry, there won't be strict requirements on log retention. In that case the legal norm can be summarized as "what reasonable people do in general". You could even point to SO for what's considered reasonable.
MSalters
And Cassandra will make sure your data is really stored on disc, on more than one host synchronously, if you ask it to.
MarkR
A: 

Depending in your system setup MySql can easily handle over 50.000 inserts per sec.

For tests on a current system i am working on we got to over 200k inserts per sec. with 100 concurrent connections on 10 tables (just some values).

Not saying that this is the best choice since other systems like couch could make replication/backups/scaling easier but dismissing mysql solely on the fact that it can't handle so minor amounts of data it a little to harsh.

I guess there are better solutions (read: cheaper, easier to administer) solutions out there.

edorian
Can you tell me your hardware spec of your current system?
Nenad
I can't tell you the exact specs (manufacturer etc.) but in general it's a 8core, 16gb ram machine with a attached storage running ~8-12 600gb drives with a raid 10
edorian
+1  A: 

Firebird can easily handle 5000 Insert/sec if table doesn't have indices.

Andrei K.
+1  A: 

I don't know why you would rule out MySQL. It could handle high inserts per second. If you really want high inserts, use the BLACK HOLE table type with replication. It's essentially writing to a log file that eventually gets replicated to a regular database table. You could even query the slave without affecting insert speeds.

Brent Baisley
The Benchmark I have do showed me that MySQL is really a serious RDBMS.
Nenad
+2  A: 

Please ignore the above Benchmark we had a bug inside.

We have Insert 1M records with following columns: id (int), status (int), message (140 char, random). All tests was done with C++ Driver on a Desktop PC i5 with 500 GB Sata Disk.

Benchmark with MongoDB:

1M Records Insert without Index

time: 23s, insert/s: 43478

1M Records Insert with Index on Id

time: 50s, insert/s: 20000

next we add 1M records to the same table with Index and 1M records

time: 78s, insert/s: 12820

that all result in near of 4gb files on fs.

Benchmark with MySQL:

1M Records Insert withtout Index

time: 49s, insert/s: 20408

1M Records Insert witht Index

time: 56s, insert/s: 17857

next we add 1M records to the same table with Index and 1M records

time: 56s, insert/s: 17857

exactly same performance, no loss on mysql on growth

We see Mongo has eat around 384 MB Ram during this test and load 3 cores of the cpu, MySQL was happy with 14 MB and load only 1 core.

Edorian was on the right way with his proposal, I will do some more Benchmark and I'm sure we can reach on a 2x Quad Core Server 50K Inserts/sec.

I think MySQL will be the right way to go.

Nenad