Squid logs on mongodb

views:

answers:

+1 Q:

Squid logs on mongodb

Hi, I'm planning to log my squid instances to a mongodb, but the actual problem is that we have a huge traffic to be logged, every access authenticated with user/pass. Eventually we have to make some reports based on logs. I was thinking to insert the logs distributed by months and by users, so my collection will look like this:

{month: 'april', users: [{user: 'loop0', logs: [{timestamp: 12345678.9, url: 'http://stackoverflow.com/question/ask', ... }]}]

So if I want to generate my reports based on the month of april I just have to get the right month instead of looking in zillions of lines to fetch the lines that timestamp match between April, 1 and April, 30.

Of course this type of insert will be slower than just insert the log line directly. So my question is: is there a best way to do this?

Nowadays we have around 12 million lines of log by day.

It's hard to tell without knowing the details, but I'd say that it's likely you're worrying about the wrong problem: you're thinking about insert speed rather than the report calculation speed.

Mongo has all day to store those 12 million entries, but you may want the report - spanning maybe half a billion entries (~= 1 month worth of data) - to render in real time (seconds maybe a minute). From that perspective, it's probably advisable to optimize for reading, rather than for writing.

Tomislav Nakic-Alfirevic 2010-04-25 15:56:18

You could also create a new collection every month. Or store the data twice. Disk space is cheap.

Theo 2010-04-26 09:31:33

ansaurus

tags:

views:

answers:

Squid logs on mongodb

related questions