tags:

views:

157

answers:

8

I want to log visits to my website with a high visits rate to file. How much writes to log file can I perform per second?

+4  A: 

Don't do that, use Google Analytics instead. You'd end up running into many problems trying to open files, write to them, close them, so on and so forth. Problems would arise when you overwrite data that hasn't yet been committed, etc.

If you need your own local solution (within a private network, etc) you can look into an option like AWStats which operates off of crawling through your log files.

Jonathan Sampson
thanks, but i need my own logging system
Ilya
The richness of the information that is provided by Analytics is wonderful.
Adam Crossland
i know, but i need some other functionality
Ilya
you just think you need your own logging system. if you really have a high volume site, you would know you don't want to log to the file system, you want to log to syslog on another machine, I recommend google analyitics as well, way better than anything you can cook up yourself
fuzzy lollipop
What *other* functionality could you possibly need that isn't already available in Analytics? Have you used Analytics before? It's enormous :)
Jonathan Sampson
yes, read my comments please
Ilya
llya, then look into a log-parser than can run through cron or something and insert the data into a mysql database for you.
Jonathan Sampson
thanks, i think i'll do so
Ilya
+4  A: 

Or just analyze the Apache access log files. For example with AWStats.

EarthMind
+8  A: 

If you can't use Analytics, why wouldn't you use your webserver's existing logging system? If you are using a real webserver, it almost certainly as a logging mechanism that is already optimized for maximum throughput.

Your question is impossible to answer in all other respects. The number of possible writes is governed by hardware, operating system and contention from other running software.

Adam Crossland
Yes, just parse the server logs.
Rob
i think that parsing is hard and long process
Ilya
@Ilya: Not nearly as hard as trying to optimize file writes. Perl exists primarily to do things like log parsing; there are other, more targeted options available as well, such as splunk. Don't go nuts reinventing this wheel.
Mike DeSimone
You can write custom logs (f.i. only to those URLs you're interested in, or in a specific format that doesn't include stuff you don't need such as User Agent), this can save a lot in parsing time. See CustomLog and friends in the Apache docs.
Wim
ok, but in some cases i need deduct banner shown, in other case - no, but in log all cases will be
Ilya
Apache logging can be conditional, so you should be able to configure it such that a log entry is written only when the banner was shown (see http://httpd.apache.org/docs/2.2/mod/mod_log_config.html). How to do this exactly will depend on the conditions of showing the banner, but better create a new question if you need help with that since this one is supposed to be about file-writing performance.
Wim
There are hundreds of existing log parsers and analyzers, why not use one of those?
Vilx-
+1  A: 

If your hard disk drive can write 40 MB/s and your log file lines are approx. 300 bytes in length, I'd assume that you can write 140000 HTTP requests per second to your logfile if you keep it open.

Anyway, you should not do that on your own, since most web servers already write to logfiles and they know very good how to do that, how to roll the files if a maximum limit is reached and how to format the log lines according to some well-known patterns.

mhaller
+1  A: 

File access is very expensive, especially when doing writes. I would recommend saving them to RAM (using whatever cache method suits you best) and periodically writing the results to disk.

You could also use a database for this. Something like:

UPDATE stats SET hits = hits + 1

Try out a couple different solutions, benchmark the performance, and implement whichever works fast enough with minimal resource usage.

Colin O'Dell
I think that using a relational database for this purpose would be overkill and would likely result in some serious performance degradation.
Adam Crossland
see my comments, i dont want to update database each time
Ilya
+1  A: 

If using Apache, I'd recommend using the rotatelogs utility supplied as a part of the standard kit.

We use this to allow rotating the server logs out on a daily basis without having to stop and start the server. N.B. Use the new "||" syntax when declaring the log directive.

The site I'm involved with is one of the largest on the Internet with hit rates peaking in the millions per second for extended periods of time.

Edit: I forgot to say that the site uses standard Apache logging directives and we have not needed to customise the Apache logging code at all.

Edit: BTW Unless you really need it, don't log bytes served as this causes all sorts of issues around the midnight boundary.

Rob Wells
how than can i parse it?
Ilya
@Ilya, see @EarthMind's suggestion about awstats for an initial starting point. We do all sorts of analysis on the logfiles on a daily basis using a custom stats analyser which is run on a dedicated suite of machines, e.g. Sun5240's. The analyser is implemented in a mixture of executables written in C and Perl scripts. This analysis process takes at least ten hours per day to run!
Rob Wells
+2  A: 

File writes are not expensive until you actually flush the data to disk. Usually your operating system will cache things aggressively so you can have very good write performance if you don't try to fsync() your data manually (but of course you might lose the latest log entries if there's a crash).

Another problem however is that file I/O is not necessarily thread-safe, and writing to the same file from multiple threads or processes (which will probably happen if we're talking about a Web app) might produce the wrong results: missing or duplicate or intermingled log lines, for example.

Antoine P.
ok, what can you advise me?
Ilya
A: 

Let Apache do it; do the analysis work on the back-end.

Paul Nathan