views:

2026

answers:

5

I'm interested in sending all Rails application logging to a database (MySQL or MongoDB) either in addition to or instead of to a log file. There are a few reasons, most of them surround doing things somewhat similar to Google Analytics - basically log file analysis. We already use Google Analytics, but there are a variety of things we want to do that aren't as workable in Analytics.

Furthermore, I'd like to be able to do "real time" investigation of issues by looking at logs. Sifting through a log file is a tedious way to do that, and I'd like to be able to do better searching and filtering than a log file allows for (or I should say, easily allows for).

Finally, the other aspect is that I often want to examine something closer to site visitor behavior - tracing the path through the site, for example so that I can see what the last page a user was looking at before an error occurred. Given we have multiple app servers, the separate log files make this a real pain. If all the data was in a database, I could then easily see the proper sequence of pages for a given visitor. I know that Syslog would be one way to solve this particular thing (single log file/repository), but again, I want to combine that with better searching abilities that I associate with what I'd get from database searches.

I'm wondering what folks recommend to solve this? Do you directly log to a database, or do you dump log files into a DB (but what's your approach for that so that it's essentially realtime/as up to date as the logfile itself)?

I am currently determining at what level I'd like this logging, because another thing I've looked at is writing a small Rack filter that would log all requests. This would miss all the extra output that the normal Rails logging dumps out (all the SQL and output on cache hits and misses, etc.), but it would achieve a big part of my goal, and seems to have the advantage of not disturbing anything else in the system.

Anyway, I am not looking for one right answer, more of a discussion and information on what anyone else might be doing in this same light. Thanks.

+2  A: 

If you want to change the default logging behavior, simply create a custom logger object that respond to all the Rails logger method:

  • add
  • debug, warn, error, info, fatal, unknown

http://github.com/rails/rails/blob/9d7aae710384fb5f04129c35b86c5ea5fb9d83a9/activesupport/lib/active_support/buffered_logger.rb

Because it's your logger, you can decide to implement your personal logic. You can write to the database, to the standard output of whenever you want.

Then, replace the default logger for every base class you want to customize.

ActiveRecord::Base.logger = YouLogger.new

You can easily create an initializer file called logger.rb and write there all your custom configurations. In this way, the logger will be immediately replaced on Rails startup.

Simone Carletti
Thanks. I should have mentioned I was aware of that option, but good notes for others as well. Mostly I'm curious how anyone else is doing this, what choices they've made and so on. For example, if you do it this way, what are the issues with speed/performance - how are you holding a DB connection and so on (if you even are), or what not.
chrisrbailey
+2  A: 

I use the rails "exception logger", to log all problems to my database while my site is in production mode. It will give you a nice interface where you can check for problems. If you want to see what your visitors are doing in realtime then take a look at woopra

atmorell
We already use Hoptoad for exception logging, which I find to be vastly superior to exception logger or exception notifier plugins. It also doesn't get anywhere close to the problem I'm trying to address. As mentioned in my question, I'm looking for far more in the logs than just errors, I want to do some analytics things, investigate a user's flow through pages, etc. I did look at Woopra, but as I recall, we are already over their limit on the amount of traffic to the site.
chrisrbailey
Woopra is the best I've found. I believe they'll be out of beta soon, and so I'd imagine their traffic limits will be upped. Though they may no longer be free as well. However, amazing service.
Ian
A: 

Chris,

I think Dima's comment is important here. Are you satisfied with (1) having an access log in a DB (in real time), or (2) are you more interested in Rails/app-specific logging?

For (1), with Apache (at least), you can log to a database using piped logging.

http://httpd.apache.org/docs/1.3/logs.html#piped

I wrote a program that runs in the background waiting for input, which it parses and logs to a Postgres DB. My httpd.conf file pipes to this program with a CustomLog directive.

This is relatively simple to set up, and gives you all the obvious advantages of being able to analyze your logs in a DB. It works very well for me, especially for tracing what a user was doing just before an error. However, you have to protect against sql injection, buffer overflows, and other security issues in the logging program.

For (2), I am not a Rails developer so I can only talk about general approaches. If you want to log environment vars, or application data, or very selective bits of information, you could consider writing a web server module. Depending on your exact needs, you could also get by with some combination of conditional logging directives and filtering in the logging program.

It really comes down to whether you need a Rails-specific solution or a more general web-server-wide solution.

Nishad
We don't use Apache (use Nginx), but this is a good point. I'm after something closer to the Rails logs, in that I want application level logging, not web server logs. I don't care about all the requests for images and CSS, etc., and I would rather have app-specific logging instead of URL's. That really implies that I need to do the logging at the Rails level (since even at the Rack level it's still just URL, although it will have sifted out static assets that are served by Nginx), but for speed and such, I may need to do it at the Rack level.
chrisrbailey
+8  A: 

My company have been logging some structured traffic info straight into a MySQL log database. This database is replicated downstream to another database. All analytics run off the final database replication. Our site sustain quite a bit of traffic. So far, it doesn't seem to have any major problems. However, our IT department have some growing concerns regarding to the scalability of the current setup and is suggesting that we offload the log info onto "proper" log-files. The log-files will then be reinserted back into the same downstream database tables. Which brings me to this question. :)

Here are some of pros and cons that I see regarding to the subject of log-files vs log-db (relational):

  • log-files are fast, reliable, and scalable (At least I have heard Yahoo! makes heavy uses of log files for their click tracking analytics).
  • log-files are easy for sys-admin to maintain.
  • log-files can be very flexible since you can write almost anything to it.
  • log-files requires heavy parsing and potentially a map-reduced type of setup for data-extraction.
  • log-db structures are a lot closer to your application, making some feature's turn around time a lot shorter. This can be a blessing or a curse. Probably a curse in the long run since you'll most likely end up with a highly coupled application and analytic code base.
  • log-db can reduce logging noises and redundancies since log-files are insert only where as log-db gives you the ability to do update and associated-insert (normalization if you dare).
  • log-db can be fast and scalable too if you go with database partitioning and/or multi-log databases (rejoin data via downstream replications)

I think some stress tests on the log database are needed in my situation. This way at least I know how much headroom I have.

Recently, I've been looking into some key-value / document-based databases like Redis, Tokyo Cabinet, and MongoDB. These fast inserting databases can potentially be the sweet spot since they provide persistence, high (write) throughputs, and querying capabilities to varying degrees. They can make the data-extraction process much simpler than parsing and map-reducing through gigs of log files.

In the long run, I believe it is crucial to have a robust analytics data warehouse. Freeing application data from analytic data and vice versa can be a big WIN.


Lastly, I would just like to point out there are many similar / closely related questions here on StackOverflow in case you want to broaden your discussion.


Edit:

rsyslog looks very interesting. It gives you the ability to write directly to MySQL. If you are using Ruby, you should have a look at the logging gem. It provides multi-target logging capabilities. It's really nice.

newtonapple
Thanks for the above. I've been looking at MongoDB myself, and that's what I'm leaning towards right now. The biggest things I need to work out is actually how to get the data into it. i.e. do I periodically parse log files, thus leaving my app untouched for this (which is nice), but makes things rather difficult (parsing Rails logging output could be painful (maybe?). Or, do I write my own Rails logger that sends to the current log (so I still get regular file logging in case something is wrong with the MongoDB), as well as writes to Mongo, or other solutions, etc.
chrisrbailey
A: 

I think this might be helpful

http://rohitsharma9889.wordpress.com/2010/08/19/logging-in-ruby-on-rails/

Rohit