tags:

views:

31

answers:

2

I'm having a system that collects real-time Apache log data from about 90-100 Web Servers. I had also defined some url patterns.

Now I want to build another system that updates the time of occurrence of each pattern based on those logs.

I had thought about using MySQL to store statistic data, update them by statement: "Update table set count=count+1 where ....",

but i'm afraid that MySQL will be slow for data from such amount of servers. Moreover, I'm looking for some database/storage solutions that more scalable and simple. (As a RDBMS, MySQL supports too much things that I don't need in this situation) . Do you have any idea ?

+1  A: 

Apache Cassandra is a high-performance column-family store and can scale extremely well. The learning curve is a bit steep, but will have no problem handling large amounts of data.

A more simple solution would be a key-value store, like Redis. It's easier to understand than Cassandra. Redis only seems to support master-slave replication as a way to scale, so the write performance of your master server could be a bottleneck. Riak has a decentralized architecture without any central nodes. It has no single point of failure nor any bottlenecks, so it's easier to scale out.

Niels van der Rest
Redis will be the perfect solution as it provides atomic increment operations to protect from race conditions. Cassandra and Riak do not support this feature which will make it hard to update the data correct.
Tobias P.
A: 

Key value storage seems to be an appropriate solution for my system. After taking a quick look on those storages, I'm concerning about race-condition issue, as there will be a lot of clients trying to do these steps on the same key:

  1. count = storage.get(key)
  2. storage.set(key,count+1)

I had worked with Tokyo Cabinet before, and they have 'addint' method which perfectly matched with my case, I wonder if other storages have similar feature? I didn't choose Tokyo Cabinet/Tyrant cause I had experienced some issues about its scalability and data stability (e.g. repair corrupted data, ...)

Huy Phan
Redis supports this with the [INCR command](http://code.google.com/p/redis/wiki/IncrCommand), Riak doesn't. MongoDB is another alternative, if you need to atomically increment a value, as it has the [$inc operator](http://www.mongodb.org/display/DOCS/Updating#Updating-%24inc). On a different note: on Stack Overflow you should use comments to ask questions, instead of answers :) (see FAQ)
Niels van der Rest
Thanks Niels, I tried to use Comment. But StackOverFlow doesn't allow me to have 'newline' for comment. That's why I use answer mode, hope SOF will support it soon. MongoDB looks good, there's also a big change about scalability (sharding mode) in version 1.6. I will give it a try.
Huy Phan