views:

262

answers:

2

We have developed PaaS solution for PHP. As part of that we offer developers to see Apache error_log and access_log files through our API.

Currently we write the logs into files on disk seperated per deployment (vhost).

Since this doesn't scale too well with a higher number of nodes and deployments, even though files are on distributed filesystem (GlusterFS), we would like to switch to something better.

Especially for billing and statistical reasons we would prefer not to parse log files every time.

As MongoDBs copped collections look awesome for logging we wanted to go with that. But turns out they don't seem to work with auto sharding which kind of spoils the point for us since we expect much more writes then reads.

The other option was Cassandra which I like for it's every node is equal approach, but they don't have something like capped collections.

Turns out neither of the two solutions offers a distinct feature that helps me make a decision, or I don't see it.

So what I'd want to know is has anybody used one of the two systems for logging before? What are your experiences, can you give me some tips? Or are there other solutions that fit our needs better?

+2  A: 

You can check out this article from Cloudkick if you are considering using Cassandra: 4 Months with Cassandra, a love story.

They are using Cassandra to store different metrics for their system, which is somewhat similar to storing log files.

EDIT:

If you haven't yet decided what to use, here's a great solution using MongoDB as a backend:

Graylog2 is an open source syslog implementation that stores your logs in MongoDB. It consists of a server written in Java that accepts your syslog messages via TCP or UDP and stores it in the database. The second part is a Ruby on Rails web interface that allows you to view the log messages.

the_void
Thanks for your answer. I read that and also http://blog.boxedice.com/2009/07/25/choosing-a-non-relational-database-why-we-migrated-from-mysql-to-mongodb/ which is about a server monitoring solution that uses MongoDB and seems to be happy with it. But I thought apart from that there might be other opinions and or solutions.
pst
The best advice would be to *play* with both and see which works out for you. Both are pretty easy to set-up and you can see for yourself whichever suits you best.
the_void
You might also be interested in this question: http://stackoverflow.com/questions/2892729/mongodb-vs-cassandra
the_void
+3  A: 

Turns out neither of the two solutions offers a distinct feature that helps me make a decision, or I don't see it.

Honestly, we're going through this test right now with some serious log data. (and by right now, I mean, a few of us were up late last night running these tests).

To me, here are the two distinguishing feature: ease of use and proven scaling.

Ease of use

  • MongoDB was easy. In a couple of hours I went from blank computer to an active Mongo instance with imported data from MySQL and a few completed map-reduces.
  • In the same period of time, team Cassandra sat around re-compiling Java files trying to get the Hadoop configured to run over an existing Cassandra implementation so that they could even run map-reduces.

Proven Scaling

  • MongoDB sharding is still in beta. It's slated for launch in the next few weeks. That's pretty tight.
  • Cassandra sharding is proven on some very large instances.

So I think the answer is really going to be specific to your personal tastes. I honestly think that Cassandra may be a more stable & proven product, but I also know from experience that the learning and setup curve is a lot steeper. So it might be worth trying a little bit of both.

Gates VP
I agree with you. MongoDB is really easy to setup but auto sharding is in beta and it doesn't seem to work with capped collections as I said above.Cassandara sharding should work, as it seems to be in use by some big companies. But the setup is a pita and I hate xml config files with a passion. But thats personal taste.Thanks for your input and I'll let you know how this works for us. Currently we are testing MongoDB. We have to test one after the other because I can't split up into teams. :)
pst