tags:

views:

278

answers:

3

I'm a complete novice in database/PC application sp please forgive my ignorance.

I'd like to capture packets to a database in real time so that multiple applications would have the ability to monitor physical I/O data being returned via udp packets from a PLC and I had a few questions.

In the long run it will need to be cross platform but for the time being I'm using a C# packet capture library in Windows. Any suggestions on database type MySQL vs SQlite?

At ~1500 200byte packets a sec, is it feasible to an insert a packet 1500 times a sec? I've read that SQlite has some problems with concurency, if I have an app querying the packet data in the database ~10 times a sec on a 25-50ms delay -is that doable?

I expect to "only" need to store 20MB or so of data in the DB at any one time. Can the database be forced to run in memory only? When writing the packet data, can the data packet (byte array) be written in one statement rather than iteratively inserting each byte/word? I suppose I could turn it into a string but I expect that would make it nearly impossible to query with any speed. I don't see any mention of anything like a "byte array type" in any of the databases I briefly looked at. FWIW All the data is coming up to a dedicated NIC on a static IP. The packets are sequential (I know it's not guaranteed with UDP but I've never seen one out of order yet) I could stride through the data easily if the database supported an array type. -That's good right, no random searches?

Thanks for taking the time to read this.

Bob

A: 

EDIT: I forgot you're working in C#.

First of all, are you planning to query the database from more than one computer? If so, you would want to use MySQL. Otherwise, SQLite is probably a good choice. But note that MySQL is probably necessary for multiple C# apps and an in-memory database. If you choose MySQL, use MySQL Connector/NET. For SQLite, there is System.Data.SQLite (which I've used for a WinForms app and can recommend).

You say you need to do 1500 200 byte insertion statements each statement. SQLite reports that it can do 50,000 per second. The key caveat is that this refers to raw inserts, not transactions. Committing a transaction slows you down, as that usually means flushing to disk.

Both SQLite (see their In-Memory Databases) and MySQL (see their MEMORY (HEAP) Storage Engine) can use in-memory databases. However, for SQLite this may defeat your goal of letting "multiple applications" access it. With SQLite, there is a undocumented (and "not guaranteed to work in future SQLite releases") way you may be able to share in-memory databases (e.g. using shared memory). It was discussed in a prior SO question; see also the linked mail message from SQLite's main author. Note that sharing a SQLite in-memory database will probably not be possible if you stick to managed code. You can definitely have a MySQL in-memory database shared between multiple clients.

Using either C# client, you should be able to insert a whole packet in a single line with a DbParameter (i.e. SQLiteParameter or MySqlParameter). Note the Value and Size properties in particular.

I don't think you need any "array type". You can simply have a incrementing primary key (INTEGER PRIMARY KEY) column and a packet content column (BLOB or TEXT). I'm not sure which of BLOB or TEXT will give you best performance for SQLite. Your SQLite schema could look like

CREATE TABLE packets ( id INTEGER PRIMARY KEY, packet BLOB);

Then, you can easily select e.g. packets within a certain range of primary keys. Of course you could add a datetime column, but that will require indexing. For MySQL, it would be something like:

CREATE TABLE packets ( id INTEGER PRIMARY KEY, packet VARCHAR(200)) ENGINE=MEMORY;

I hope this helps. Keep in mind profiling is the best way to be sure what works well for your app.

Matthew Flaschen
Matthew,Again, I had typed in a lengthy response and thank you but it seems to be gone now?. The apps will run on the same machine as the database. Thanks for the help.
rackmount
In that case it probably depends on whether you decide to go the in-memory route. If you use on-disk, SQLite will probably be simpler, in-memory then MySQL.
Matthew Flaschen
+1  A: 

What is the perceived advantage you're looking for in a relational database for this? Since you say you're not much into databases, here is a brief of usual reasons why SQL is an options, perhaps it helps you clarify your requirements and your options:

  1. Queryability. If you want to expose the data for a rich search that includes options to filter out records, to sort results, to aggregate calculations then indeed SQL databases offer such facilities. They do not come for free though. To speed up searches a database engine has to duplicate parts of the data into several indexes, which adds to the insert/update times as all those indexes have to be maintained.
  2. Recoverability. Databases can ensure that data is kept in a consistent state in case of a crash. Using either write-ahead log or versioned updates they write changes in a fashion than ensures the client that when his statement returned back to him the changes it made are durable (I'm omitting a bunch of details for simplicity).
  3. Consistency. By isolating changes between users until they explicitly commit a group of related operations the database exposes always a consistent state to a viewer. To achieve this a database will have to deploy either locking or versioning.
  4. Scalability. Databases can take care of maintaining very large sets of data, much larger than a process viable address space. They'll use a buffer pool to keep hot pages cached and manage the underlying file-offset-to-memory-address mapping and also all the needed I/O to read from disk and write back changes. They will also present multiple files as an united storage area, thus surpassing OS file size limitations, if any.
  5. Interoperability. Other processes can use standard libraries (ie. ODBC, ADO etc) and languages (SQL) to operate on the data, so there is no need to develop a custom library/access API.

Now, is any of these needed by your scenario? Is there something else I omitted? I'm asking these questions because what you want to achieve is not trivial. You can achieve 1500 inserts per second with relative ease, but is much harder to do that and offer decent read performance. Also it seems that much of what relational databases offer (consistency, recoverability, scalability) are not a goal for you. There are a number of products tuned specifically for the in-memory niche that are much faster than what you'd get from a typical disk oriented relational database.

Remus Rusanu
I agree that he may not actually need a relational database. But as you say, there are benefits to the model (such as recoverability and interoperability) that definitely seem relevant.
Matthew Flaschen
rackmount
Matthew has already give you couple of pointers on MySQL and SQLite. You can give a look to SQL CE cuz it integrates very well with CLR:http://msdn.microsoft.com/en-us/library/ms174461.aspx and even SQL Express. Not that is anything worng with MySQL or SQLite ;)
Remus Rusanu
A: 

libpcap, wireshark round robin files

Look around, play with wireshark, look at how it achieves similar results to yours.

caskey
How does this answer the question in any way?
Matthew Flaschen
Thanks. I've been using Wireshark to watch and decipher the packet data and headers.. works great but I was flailing on the PERL scripting trying to get it to capture to a DB. The troubles I had were dependancy related.. found a couple of good tutorials with ppm links but kept failing on Net::Ethernet missing... closest package I could find was something like Address::Ethernet (I forget) I was doing everything in Windows using ActiveState 5.10? so maybe the package names didn't directly correlate... finally gave up.
rackmount
Round Robin??? Like circular buffer?
rackmount