views:

315

answers:

4

What would be suitable database for following? I am especially interested about your experiences with non-relational NoSQL systems. Are they any good for this kind of usage, which system you have used and would recommend, or should I go with normal relational database (DB2)?

I need to gather audit trail/logging type information from bunch of sources to a centralized server where I could generate reports efficiently and examine what is happening in the system.

Typically a audit/logging event would consist always of some mandatory fields, for example

  • globally unique id (some how generated by program that generated this event)
  • timestamp
  • event type (i.e. user logged in, error happened etc)
  • some information about source (server1, server2)

Additionally the event could contain 0-N key-value pairs, where value might be up to few kilobytes of text.

  • It must run on Linux server
  • It should work with high amount of data (100GB for example)
  • it should support some kind of efficient full text search
  • It should allow concurrent reading and writing
  • It should be flexible to add new event types and add/remove key-value pairs to new events. Flexible=no changes should be required to database schema, application generating the events can just add new event types/new fields as needed.
  • it should be efficient to make queries against database. For reporting and exploring what happened. For example:
    • How many events with type=X occurred in some time period.
    • Get all events where field A has value Y.
    • Get all events with type X and field A has value 1 and field B is not 2 and event occurred in last 24h
A: 

The two I've seen used successfully are MongoDB and Cassandra.

Doobi
Did you mean MongoDB?
Juha Syrjälä
MongoDB is Fantastic for Logging: http://blog.mongodb.org/post/172254834/mongodb-is-fantastic-for-logging
kristina
A: 

RDM Embedded from Raima is a great choice. It is fast and has an efficient c-api and circular tables which uses the FIFO principle to clear out old records while inserting new ones. You set the number of records to keep in the table and it maintains that amount automatically without any hits to performance. You can download the free RDM Embedded SDK at http://www.raima.com/products/rdm-embedded/sdk-download/.

embeddedDB
A: 

We used Redis to do all our centralized logging for all our app servers at mflow.com. It is very fast, which based on these benchmarks it does about 110000 SETs per second, about 81000 GETs per second. It has a VM implementation (if your dataset exceeds available memory) which swaps out un-frequented values out to disk.

It's an advanced data-structures server that can store any binary-safe data with native support for strings, lists, sets, sorted sets and hashes. Based on discussions on the mailing list it is heavily used by a lot of people to store analytics.

mythz
+1  A: 

should I go with normal relational database (DB2)?

Yes, you should! If you just want to store stuff and scan it, you might as well write to a file. Very fast, no overhead! But the minute you want to summarize data over time (last 24h, or between time t and t+1), the more you care about the data as something other than lines of text, no question a proper RDBMS is your friend.

James K. Lowden