We have an app that creates text log files of requests to it. Pretty typical stuff in the log file and it is space delimited (date, time, url, http code, ip, user agent, etc).
Currently, we are generating around 500k entries in the text log files per day.
We're currently doing a lot of analysis via sed/awk/grep of the text files. However, that isn't really going to scale especially as we want to start reporting across multiple days:
e.g. - How many times did this IP address hit this URL in the last 5 days - What % of requests resulted in 500s for specific URL's
It's easy enough to do regular imports into a mysql db and pull this type of data with select/group-bys. However, even with a few hundred thousand rows, the queries are relatively slow.
I'm a n00b when it comes to some of the new no-sql dbs out there (Casandra, Dynamo, BigTable) but would any of them be well suited for this? I'm continuing reading up on them but maybe this crew had some recommendations.
Thanks!