views:

364

answers:

5

I have an interesting problem and would appreciate your thoughts for the best solution. I need to parse a set of logs. The logs are produced by a multi-threaded program and a single process cycle produces several lines of logs.

When parsing these logs I need to pull out specific pieces of information from each process - naturally this information is across the multiple lines (I want to compress these pieces of data into a single line). Due to the application being multi-threaded, the block of lines belonging to a process can be fragmented as other processes at written to the same log file at the same time.

Fortunately, each line gives a process ID so I'm able to distinguish what logs belong to what process.

Now, there are already several parsers which all extend the same class but were designed to read logs from a single threaded application (no fragmentation - from original system) and use a readLine() method in the super class. These parsers will keep reading lines until all regular expressions have been matched for a block of lines (i.e. lines written in a single process cycle).

So, what can I do with the super class so that it can manage the fragmented logs, and ensure change to the existing implemented parsers is minimal?

A: 

You need to store lines temporarily in a queue where a single thread consumes them and passes them on once each set has been completed. If you have no way of knowing the if a set is complete or not by either the number of lines or the content of the lines, you could consider using a sliding window technique where you don't collect the individual sets until after a certain time has passed.

kasperjj
A: 

Would something like this do it? It runs a new Thread for each Process ID in the log file.

class Parser {
   String currentLine;
   Parser() {
      //Construct parser
   }
   synchronized String readLine(String processID) {
      if (currentLine == null)
         currentLine = readLinefromLog();

      while (currentline != null && ! getProcessIdFromLine(currentLine).equals(processId)
        wait();

      String line = currentLine;
      currentLine = readLinefromLog();
      notify();
      return line;
   }
}

class ProcessParser extends Parser implements Runnable{
   String processId;
   ProcessParser(String processId) {
      super();
      this.processId = processId;
   }

   void startParser() {
       new Thread(this).start();
   }

   public void run() {
      String line = null;
      while ((line = readLine()) != null) {
          // process log line here
      }
   }

   String readLine() {
      String line = super.readLine(processId);
      return line;
   }
Skip Head
+1  A: 

It sounds like there are some existing parser classes already in use that you wish to leverage. In this scenario, I would write a decorator for the parser which strips out lines not associated with the process you are monitoring.

It sounds like your classes might look like this:

abstract class Parser {
    public abstract void parse( ... );
    protected String readLine() { ... }
}

class SpecialPurposeParser extends Parser {
    public void parse( ... ) { 
        // ... special stuff
        readLine();
        // ... more stuff
    }
}

And I would write something like:

class SingleProcessReadingDecorator extends Parser {
    private Parser parser;
    private String processId;
    public SingleProcessReadingDecorator( Parser parser, String processId ) {
        this.parser = parser;
        this.processId = processId;
    }

    public void parse( ... ) { parser.parse( ... ); }

    public String readLine() {
        String text = super.readLine();
        if( /*text is for processId */ ) { 
            return text; 
        }
        else {
            //keep readLine'ing until you find the next line and then return it
            return this.readLine();
        }
    }

Then any occurrence you want to modify would be used like this:

//old way
Parser parser = new SpecialPurposeParser();
//changes to
Parser parser = new SingleProcessReadingDecorator( new SpecialPurposeParser(), "process1234" );

This code snippet is simple and incomplete, but gives you the idea of how the decorator pattern could work here.

Alex B
A: 

One simple solution could be to read the file line by line and write several files, one for each process id. The list of process id's can be kept in a hash-map in memory to determine if a new file is needed or in which already created file the lines for a certain process id will go. Once all the (temporary) files are written, the existing parsers can do the job on each one.

Aleris
A: 

I would write a simple distributor that reads the log file line by line and stores them in different VirtualLog objects in memory -- a VirtualLog being a kind of virtual file, actually just a String or something that the existing parsers can be applied to. The VirtualLogs are stored in a Map with the process ID (PID) as the key. When you read a line from the log, check if the PID is already there. If so, add the line to the PID's respective VirtualLog. If not, create a new VirtualLog object and add it to the Map. Parsers run as separate Threads, one on every VirtualLog. Every VirtualLog object is destroyed as soon as it has been completely parsed.

micro