views:

169

answers:

1

LogParser isn't open source and I need this functionality for an open source project I'm working on.

I'd like to write a library that allows me to query huge (mostly IIS) log files, preferably with Linq.

Do you have any links that could help me? How does a program like LogParser work so fast? How does it handle memory limitations?

+2  A: 

It probably process the information in the log as it reads it. This means it (the library) doesn't have to allocate a huge amount of memory to store the information. It can read a chunk, process it and throw it away. It is a usual and very effective way to process data.

You could for example work line by line and parse each line. For the actual parsing you can write a state machine or if the requirements allows it, use regex.

Another approach would be a state machine that both reads and parses the data. If for some reason a log entry spans more than one line this might be needed.

Some state machine related links:

A very simple state machine written in C: http://snippets.dzone.com/posts/show/3793

Alot of python related code, but some sections are universally applicable: http://www.ibm.com/developerworks/library/l-python-state.html

Skurmedel