views:

723

answers:

23

Hello All,

I am having a problem in one of the teams that I am working in. One of the guys is a bit SQL happy in my opinion and wants to store the log information generated by a small python FTP downloader into a database, instead of just a nice formatted text file. Now its always been my opinion that a database should only be used if it speeds things up, or provides a more reliable interface to the data. What are your opinions?

Thanks!

Edit: In this particular instance, the data will grow about 100 lines per day and be processed once and thrown away. Although this case is of immediate concern, I am more interested in a general answer.

Edit 2: Thanks for all of your responses! I have marked the answer with the most up votes as the answer because I feel that it succinctly states most of the points you all have made, but I will watch and see if something else comes up.

+11  A: 

If you want to run reports on the data, or ask it questions later, a database is a logical choice, especially if you are storing multiple runs in the same database file to look for trends.

If you are only writing the logs from individual runs, and don't care about the data after you review it, then a database probably doesn't make sense.

Robert Harvey
http://www.google.co.uk/search?q=log+file+trend+analysis+software it seems that many people would disagree that looking for trends in log files is a database only job.
Pete Kirkham
@Pete, fair enough. I suspect that, if you are doing trend analysis (which the OP is not), the job can be made easier by having the data in a database in the first place.
Robert Harvey
+4  A: 

Databases offer scalability, whereas flat files do not. What happens if the app you developed is required to do more in 2 years time?
Databases also offer numerous other benefits including permission levels and built-in backups which you would have to manually configure otherwise, increasing the work that needs to be done. I will always choose a database over a flat file if it is an option. Always.

Kolten
God, I can't believe I didn't mention either "relational" or "normalized" in my answer. +1.
AJ
I'm just wondering how much a log file would need to scale, how many permission levels would be needed for logging. These are good points to use a text file only IMHO. And what if my app needs to do more in two years? Then I will change the app in two years. Thinking YAGNI.
0xA3
+1  A: 
  • performance
  • scalability
  • redundancy
  • normalization
  • data integrity
  • multiuser (concurrent) access
  • data storage efficiency (depending on indexing of course)
RedFilter
+4  A: 

Suggest using log4j / log4cxx (you didn't specify a language). There are appenders available that can put the data into a database, or a flat file, or a syslogd. You can set that up to be whatever the group decides upon at any point. You can even do both at the same time. It's the best of both worlds.

Kieveli
+5  A: 

What happens when the log file causes you to run out of disk space?

Advantages of storing logging information in a database table:

  1. Easily queryable, if you format the table correctly. Wanna find out why your FTP download broke on 11:53 AM last Tuesday? Have fun surfing your flat file. I will write a query and have the information in a fraction of the time.
  2. Easily scalable. If you have an enterprise level database, you will never (unless your DBAs are silly) have to worry about logs running out of disk space.
  3. Transactional: You don't have to worry about file locks and appends.

I feel like I could go on for hours on this topic. Seriously, get a standard logging approach and use a database table, and you will not regret it.

AJ
What happens when the database causes you to run out of disk space?
Robert Harvey
I feel that "easily scalable" covers that.
AJ
@Robert: Wouldn't a flat file have a risk of running out of disk space too?
Bill Karwin
@Bill, of course. That was my point.
Robert Harvey
And how about the DBA that is needed in that scenario? That usually doesn't come for free.
0xA3
And regarding point 3: You may not have to worry about file locks, but locking also occurs on database level.
0xA3
+4  A: 

There are a whole host of questions that come to my mind which would guide answers, and ultimately your own.

  • Are you needing to search through the data at a later point, if not, why is it being logged? If you do, is the quantity or type of searches suitable for a flat file.
  • Are the data quantities small, and the database is a premature optimization, or are you going to be storing a lot of log data?
  • What backup / DR / Restore SLA are you going to be working under, if you have none and never intend backing the file up or protecting it, e.g. its informational at best then a file may be fine, but if you have to ensure the data is safe and a point in time recovery is achiveable then you need to look at an alternative to a flat file.
  • Is the data small now, but will scale / get larger over time? making the choice of a file for a short term solution, can really damage you on the longer term.

There is no one solution, a DB might prematurly optimising, but equally could be very valid.

A.

Andrew
+1  A: 

Storing to a database could also allow someone to query the logs for various purposes at a later date. (assuming the individual elements of the log event, such as date/time, event type, numeric code, clear-text message etc. are kept separately.)

Typically storing to DB will incur a small performance hit, as compared with flat text output. This will be more noticeable if the underlying database table has many indexes. Sometimes a valid approach is to store to a database heap (a table without any index, or maybe just one simple index), and to keep this heap small by moving its contents to a fully indexed table, every evening (or whenever the SQL load is expected to be low).

On related matters, you could look into many useful logging libraries such as log4j (which btw can be configured to go to flat files, with rolling management, or to database back-end)...

The only logs I would recommend leaving in flat text file only format, are these associated with rare/occasional error messages and other exception cases. The text file format then provides ready access to the information (using a local text editor), used for diagnostic purposes interest for log event older than a few weeks.

mjv
Mind too that it's easier to get a flat file out of a database than it is to get the content of a flat file into a database.
OMG Ponies
I think one advantage of a flat file you mentioned is very important: quick access with tools you almost always have. using a simple grep on your flat-text log is easy enough if you're used to it, and how about tail -f?
Erik van Brakel
+1  A: 

If you are just 'throw away' your data and don't intend to manipulate / query it later, a text file is preferable, since it's faster than using a database.

Anax
..except he said "performance wasn't an issue".
David
I believe he means faster as in coding time/setup time.
Seamus
+5  A: 

A flat file is a form of database.

The reason to chose a pre-existing DBMS instead of rolling your own is chiefly that your time is better spent on the problem domain rather than re-inventing the wheel.

You could always go with a low-end or OSS database if your needs are simple and you don't want to spend a lot of money on it.

JohnFx
+2  A: 

Most of the answers seem to be giving mere lip service to the biggest advantage: sophisticated ad-hoc querying. Scalability in this case doesn't have anything to do with it.

Hank Gay
A: 

I like to plan a little bit for the future. If a flat type file gives you just what you need for today, what if your specs change or the client wants more later. You don't want to have to explain that it is going to take a lot of time to reengineer a solution. If there is any chance that this solution needs to persist over time and could be influenced by clients, a database solution will have the flexibility you will likely need.

Edward Leno
I'm pretty sure a nicely formated flat file can be inserted without trouble to a database anytime, should the requirements change.
Anax
"any chance", "could be"... very flimsy reasons for throwing a database at it. Simply log from one place, and change that routine later if you suddenly need more out of it. Put more than 2 seconds of thought into your design and you won't have a problem.
darron
A: 

There are a lot of good (accepted answer quality) answers already, I'm just adding one point that should be considered:

If you're running low on disk space, or you just don't want to waste 16GB on a flat file after 5 years of recording logs, would you prefer to just issue a "DELETE FROM Logs WHERE Date < x" that can run concurrently with no downtime, or would you prefer to have to offline your application while you trim 16GB worth of lines from the top of your text file (you bet that's going to lock the file).

There is a big difference between "it's not too fast" and "it's not running at all".

Edit: In response to your edit, if you plan on throwing away the data once processed, wouldn't it be easier to clip data from a database (DELETE) then a flat file (unless you start using fixed line sizes and impliment your own block allocation scheme, at which point you've just start implimenting a poor mans database)

David
A: 

Relational technology offers the possibility to query the underlying in any way conceivable without the user having to know about the storage and physical layout stuff.

That holds even for SQL systems.

If you don't need queryability, then any option is likely to suit your purpose, and the "simplest" (e.g. plain bytes flatfile) are likely to give you the best performance.

One more thing : if you have multiple concurrent sources of log entries, then serialization issues become important. When logging to a flatfile, locks on the flatfile will last for the time needed to do the write, when logging to a database the logging itself becomes part of a transaction and locking (on the log table) is likely to last for the time of that transaction, perhaps causing "queueing overflow", or "convoy syndrome", or whatever you want to name it.

Erwin Smout
+4  A: 

Given the wealth of log file analysis programs out there and the number of server logs which are plain text, it's well established that plain text log files do scale and are fairly easily queryable.

In general, most SQL databases are optimised for updating data robustly, rather than simply appending to the end of a time series. The implementation assumes that data should not be duplicated and there are integrity constraints relating to references to other relations/tables which need to be enforced. Since a log never updates an existing entry, and so has no constraints which can be violated or cascading deletions, there's a lot there which you'll never use.

You might prefer a database for transaction scalability - say if you want to centralise many logs into one database so are actually getting some concurrency ( though it's not intrinsic to the problem - having separate logs on one server would also allow this, but you then have to merge them to total for all your systems ).

Using an SQL database is a bit more complicated than just appending a file or two and calling fflush. OTOH if you are very used to working with SQL and are already using a database in the project then there's little overhead in also using a database for logging.

Pete Kirkham
A: 

I think you may have answered your own question:

Now its always been my opinion that a database should only be used if it speeds things up, or provides a more reliable interface to the data.

A database by definition provides a more reliable interface to structured data — providing named columns and guaranteed data typing, to start.

If your needs are truly simple (a small number of absolutely consistent fields with no normalization issues) you probably won't suffer too much from using a text file. But how do you plan to analyze the file? Presumably the first step will be to read it into a database or some in-memory data structure. Using a database to begin with means that step is already done for you.

Larry Lustig
A: 

Write to syslog (if running on a Unix system), redirect syslog to both a rotating logfile and a database.

The logfile is always useful for realtime monitoring using standard unix tools such as tail which can be combined with grep etc.

syslog can redirect log messages to different servers, multiple targets etc.

It is not always wise to build in database dependencies in an application, if the DB fails what happens to the logging?

How to you log DB failures if your only logging goes to the DB?

Ernelli
+1  A: 

It depends on context. If its very limited as you suggest simply logging some basic file transfer data processing the log once and throwing it away I would tend to be attracted to the flat file option as well. RDBMS would be a bit overkill however maybe forseeable future conciderations can add an overriding factor.

As a compromise you may want to think about an embedded solution like SQL Lite et al or using a database abstraction API (such as flat file ODBC driver) that operates on flat files and can later be easily changed to operate against an RDBMS without any or any siginficant code changes as conditions warrent.

You might also want to think in terms of log server such as using reliable syslog with database backed storage. With this method there is less complexity in the simple application and all systems can benefit from the arrangement.

Einstein
+1  A: 

what about sqlite? It's a C library that implements a very simple database, recomended for simple projects.

marcos
+3  A: 

Look, a lot of the "think of future needs" arguments are blantant over-engineering. KISS.

The only thing you need to do to address future needs in this respect is to simply write your logging routines in such a way that it is easy to totally redirect it later to something else. DIY text, syslog-type services, or a DB. Keep that concept in mind, but DON'T write anything but what you need right now.

From what you described, it absolutely sounds like you should just use a simple text file.

darron
A: 

Flat files made $50,000,000 for Paul Graham.

David Plumpton
+1  A: 

Two things would lead me to using a database:

(a) Your log file has distinct fields, like date logged, id of logged-in user at time of event, module triggering the event, etc; and

(b) You have a need to query against these fields, especially complex queries. Like, "list all the memory overflows triggered by module xyz on weekends".

If, on the other hand, your log file is a series of unrelated messages put out by a variety of modules with no consistent format, so that the only possible create statement for your log file is "create table log (logmessage varchar(500))", then I don't see any clear gain to using a database.

A database will surely be slower: it's always going to take more time to update indexes and do dynamic inserts than to just append to the end of a text file. Writing to a database involves the possibility of data being lost or corrupted due to database problems. This is rare, of course, but presumably the point of a log file is to help you track down problems like data corruption. If your error identification and recovery procedure is based on the assumption that you will never have any errors, then why are you doing it at all? It brings to mind all the lame jokes about the help desk sending out emails alerting people that the email system isn't working.

Personally, I almost always write logs to a simple text file. I can only think of a few occasions when I logged to a database. And the last time I did that was because I didn't have access to the file system on the production server, but could access the database.

Jay
+1  A: 

As a developer of client/server and now n-tier applications, I have great love the the power, reliability and speed of database systems. Having said this I am very hesitant to do process logging in a db. Storing a current status or critical state transitions of a complex workflow in a db is great, but loggin/tracing all of the steps in the DB can be an issue. If the reason for logging is to be able to trace failures and possibly debug the system, I need to be able to process my "Log" in the direst of circumstances. What if my db/network/? are not functional in some way. If I can get to the server at all, a text file let's me debug with vi/emacs/notepad/*. Not the most powerful of toolsets but always available. A well formatted log file can also have reports generated by use of grep/awk/sed, etc. Again, not the most powerful but readily available. In the end, if I expect my logging to be used in failure scenarios, I need to have the highest availability possible, and presuming I am in a failure state, I can't assume that my DB will still be running.

cdkMoose
A: 

Flat files are databases if you treat them as databases. Advantages of using flat files:

  • highly portable
  • human readable/directly editable
  • zero configuration/administration (sqlite has this advantage too). Security amounts to setting file permissions correctly

Disadvantages:

  • time/space efficiency (this doesn't seem matter for your use case though)
  • no data integrity checks
  • no explicit data types
  • tools for working with flat files as databases are (for the most part) much less mature than DBs with a native storage formats

It is wrong to say that you need to write to a DB in order to query your data. There are several tools that let you do that with flat files:

Keith