views:

176

answers:

5

I'm writing a piece of honeypot software that will have extensive logging of interactions with it, I plan to log in plaintext .log files.

I have two questions, from someone who isn't too familiar with how servers log, firstly how shall I break up my log files, I'm assuming after running this for a month I don't want one big .log file, do I do this by day,month,year? Is there some standard for it?

The format of each line, do I have one standard deliminiter that is whater, *, -, +, anything? Is there a standard anywhere (my googling hasn't brought up much).

Thanks!

+3  A: 

I like this format for log files:


$ python simple_logging_module.py
2005-03-19 15:10:26,618 - simple_example - DEBUG - debug message
2005-03-19 15:10:26,620 - simple_example - INFO - info message
2005-03-19 15:10:26,695 - simple_example - WARNING - warn message
2005-03-19 15:10:26,697 - simple_example - ERROR - error message
2005-03-19 15:10:26,773 - simple_example - CRITICAL - critical message
This is from python's logging module (http://docs.python.org/library/logging.html). I usually have a file per day, one folder for each month, one folder for each year. You'll get huge log files that you can't edit properly otherwise.
logs/
  2009/
    January/
     01012009.log
     02012009.log
     ...
    February/
     ...
  2008/
   ...
Pierre-Antoine LaFayette
I like this, I absolutely HATE log files that don't have date/time.
amischiefr
For some reason I couldn't make the link on "logging module" work by selecting the text and using the link icon. Bug in the URL generator code?
Pierre-Antoine LaFayette
The link in the original answer was working before your edit, actually. I clicked on it and it brought up the Python logging website.
Jeff
Yes, I don't know what happened it may be <pre> tag that is confusing it or something.
Pierre-Antoine LaFayette
With regards to doing this in Java, I'm looking at java.util.logging.Logger, it logs in an XML format though, would it be preferable to stick to lines or leave the XML?
Andrew
You should use log4j (http://logging.apache.org/) if you're using Java. It's a more flexible logging utility, there's a good tutorial here (http://www.laliluna.de/log4j-tutorial.html). If you need to parse/search the data XML may be preferable or you may even want to try mjv's suggestion to log to a database.
Pierre-Antoine LaFayette
+1  A: 

There is no standard for such a logging. And rolling, layout of files, it all depends on what you need. In general I have faced 3 main scenarios:

  • All in one file. Seems not an option for you.
  • Fixed size rolling. You define size when new log file is created once current file is bigger than defined value. Usually there is support out of a box for this in most log4anything packages.
  • Total custom rolling. I've seen layouts like this
    • Every day gets it's own directory named in format of YYYYMMDD. If you don't stage your logs consider directory layout like YYYY\MM\YYYYMMDD as shown in other answers.
    • Inside this directory fixed size rolling should be used.
    • Every file has name logfile_yyyymmdd_ccc.log where ccc is increasing number. Adding time to file name is also a good idea (eg. to easily judge how many logs per minute you are generating)
    • To save space every log is compressed with zip automatically.
    • Last 3 days are allways kept uncompressed so you can have a quick access with UNIX text tools.

This custom one looked like this

logs/
  20090101/
     logfile_20090101_001.zip
     logfile_20090101_002.zip
     ...
  20090102/
     logfile_20090102_001.zip
     logfile_20090102_002.zip
   logfile_20090101_001.log
   logfile_20090101_002.log
   logfile_20090102_001.log
   logfile_20090102_002.log

There is also some bunch of good practices for good logging:

  • Always keep date in your log file name
  • Always add some name to your log file name. It will help you in the future to distinguish log files from different instances of your system.
  • Always log time and date (preferably up to milliseconds resolution) for every log event.
  • Always store your date as YYYYMMDD. Everywhere. In filename, inside of logfile. It greatly helps with sorting. Some separators are allowed (eg. 2009-11-29).
  • In general avoid storing logs in database. In is another point of failure in your logging schema.
  • If you have multithreaded system always log thread id.
  • If you have multi process system always log process id.
  • If you have many computers always log computer id.
  • Make sure you can process logs later. Just try importing one log file into database or Excel. If it takes longer than 30 seconds it means your logging is wrong. This includes:
    • Choosing good internal format of logging. I prefer space delimeted since it works nice with Unix text tools and with Excel.
    • Choosing good format for date/time so you can easily import into some SQL databse or Excel for further proccesing.
Michal Sznajder
A: 

To break up your log files, you could use an external application like logrotate and let it take care of the dirty work.

As for the format of each line, there's no standard, so you should use what works best for you. If you're going to automatically parse the log file later, then you might want to keep that in mind as you format the log output.

Jeff
+1  A: 

A suggestion:

It being for a honeypot system (and unless the baddies are really whacking the application/site), you may consider taking the extra time to log to a database instead.

This will make the analysis and usage of the logs easier, and real-time (i.e. you do not need to go through the ETL process prior to analyzing / browsing the logs.

This said being in a DB table(s) or in file(s), this doesn't preclude the need to define a format. Tentatively, you can have a "polymorphic" format, with a few common attributes (ID, IP address, Timestamp, Cookie/ID, "level" [of importance/urgency]) followed by a short mnemonic code defining a particular event type (say "LIA" = login attempt, "GURL" = guessed url, "SQLI" SQL Injection attempt etc...) followed by a few numeric fields, and a few string fields which semantics will vary as per the mnemonic. To summarize:

 - Id
 - TimeStamp  (maybe split in date and time)
 - IP_Address
 - UserID_of_sorts
 - // other generic/common fields that you may think of
 - EventCode   (LIA, GURL, SQLI...)
 - Message   Text message (varies with particular event instance)
 - Int1      // Numbers...
 - Int2
 - Str1      // ...and text which meaning varies with the EventCode
 - Str2
 - //... ?

Now... regardless of this going to a flat file or to SQL database (and maybe particularly if going to DB), you could/should use a standard logging library. Maybe log4j as suggested in other replies (although I'm not sure if it readily has bindings in Python, and anyway, the Python's standard logging module is +/- the same...) or even the Python's standard library's logging module can probably be tailored for your needs.

mjv
A: 

I recommend you use a well-known logging library. Most logging libraries support rollover for you. Log4Net (.net) / Log4J (java) is a particularly good logging library to use, and it has a lot of options that you may find useful. Use whatever rollover interval works best for you. For a honeypot application, I think you will find hourly or daily turnover to work best. You could also use a fixed limit, like 256mb, to ensure that your log efforts don't overrun the available free disk space. Log4Net/Log4J supports this as well.

Log4J @ Apache.Org
Log4Net @ Apache.Org

The format of your logfiles should be setup according to your needs. It is highly desirable to use a delimiter that is unlikely to show up in your log input. For your application, this may not be possible. Under typical circumstances, some parties use spaces (NCSA logs), some parties use commas (to make CSV files), some parties use tabs (to make tab-delimited files). Each of these has their own benefits and drawbacks.

meklarian