views:

1099

answers:

4

I am attempting to "grep" out bind for a specific user from an LDAP log file. The lines I need will be spread across multiple lines in the log. Here is example input:

[2009/04/28 17:04:42.414] DoBind on connection 0x7c8affc0
[2009/04/28 17:04:42.414] Bind name:cn=admin,ou=appids,o=admineq, version:3, authentication:simple
[2009/04/28 17:04:42.415] Failed to authenticate local on connection 0x6cc8ee80, err = log account expired (-220)
[2009/04/28 17:04:42.416] Sending operation result 53:"":"NDS error: log account expired (-220)" to connection 0x6cc8ee80
[2009/04/28 17:04:42.416] Operation 0x3:0x60 on connection 0x6cc8ee80 completed in 3 seconds
[2009/04/28 17:04:42.416] Sending operation result 0:"":"" to connection 0x7c8affc0
[2009/04/28 17:04:42.416] Operation 0x1:0x60 on connection 0x7c8affc0 completed in 0 seconds
[2009/04/28 17:04:48.772] DoSearch on connection 0x7c8affc0
[2009/04/28 17:04:48.772] Search request:
base: "o=intranet"
scope:2  dereference:0  sizelimit:0  timelimit:600  attrsonly:0
filter: "(guid='03ADmin)"
attribute: "cn"
attribute: "cn"
attribute: "cn"
attribute: "cn"
attribute: "objectClass"
attribute: "guid"
attribute: "mail"
[2009/04/28 17:04:48.773] Sending operation result 0:"":"" to connection 0x7c8affc0
[2009/04/28 17:04:48.773] Operation 0xe851:0x63 on connection 0x7c8affc0 completed in 0 seconds

For this example the following should be the result:

[2009/04/28 17:04:42.414] DoBind on connection 0x7c8affc0
[2009/04/28 17:04:42.414] Bind name:cn=admin,ou=appids,o=admineq, version:3, authentication:simple
[2009/04/28 17:04:42.416] Sending operation result 0:"":"" to connection 0x7c8affc0
[2009/04/28 17:04:42.416] Operation 0x1:0x60 on connection 0x7c8affc0 completed in 0 seconds

Basically, this is a log of server operations across multiple connections. I need to analyze the time spent in 'bind' operations by the admin user, but this server is very busy so I need to eliminate a lot of noise.

In pseudocode:

for each line in file
    if line contains "DoBind" and next line contains "cn=admin"
        print both lines
        find the connection number X in lines
        skip lines until "Sending operation result.*to connection X" is found
        print two lines

I would like to get the "DoBind" lines which are preceeded by the user "cn=admin" and then the result lines, which are listed according to the connection number "0x7c8affc0" in this example. Other operations may take place between the beginning and end of the bind which I do not need, such as the "Failed to authenticate" message, which is taking place on a different connection.

Furthermore, other operations will take place on the connection after the bind is done which I'm not interested in. In the above, the results of the DoSearch operation happening after the 'bind' must not be captured.

I'm trying to do this with 'sed', which seemed like the right tool for the job. Alas, though, I'm a beginner and this is a learning experience. Here's what I have so far:

/.*DoBind on connection \(0x[0-9a-f]*\)\n.*Bind name:cn=OblixAppId.*/ p
/.*Sending operation result.*to connection \1\nOperation.*on connection \1 completed.*/ p

sed complains about the second line where I use '\1'. I'm trying to capture the connection address and use it in a subsequent search to capture the result strings, but I'm obviously not using it correctly. The '#' variables seem to be local to each search operation.

Is there a way to pass "variables" from one search to another or should I be learning perl instead?

+1  A: 
fgrep -B1 cn=admin logfile | 
sed -n 's/.*DoBind on connection \(.*\)/\1/p' | 
fgrep -wf - logfile

This first fgrep extracts the Bind line and the previous line (-B1), the sed pulls out the connection number and the final fgrep finds all lines that contain one of the connection numbers.

This is a two pass solution, a one pass is possible but more complicated to implement.

Edit: Here's a solution that does what you want in python. Note however, that this is not fully correct as it won't handle interleaved log lines between different connections correctly - I'll leave it up to you if you care enough to fix it. It's also a bit inefficient, and does more regex compiles and matches than necessary.

import re

todo = set()
display_next = False
previous_dobind = None

for line in open('logfile'):
  line = line.strip()
  if display_next:
    print line
    display_next = False
    continue
  dobind = re.search('DoBind on connection (.*)', line)
  bind = re.search('Bind name:cn=admin', line)
  oper = re.search('Sending operation result.*to connection (.*)', line)
  if dobind:
    previous_dobind = (dobind.groups(1), line)
  elif previous_dobind:
    if bind:
      todo.add(previous_dobind[0])
      print previous_dobind[1]
      print line
    previous_dobind = None
  elif oper:
    conn = oper.groups(1)
    if conn in todo:
      print line
      display_next = True
      todo.remove(conn)
Thanks, but there will be other operations taking place on these connections that I am not interested in. The result lines don't identify the fact that the result was for a 'bind' operation. The second part of this script would end up giving me all operation results that happened on any connection that the admin bound to. I'll add this to the question.
veefu
It isn't fully clear to me what you want. Which lines do you want as output? Can you provide sample output? Based on your description "fgrep -v DoSearch" or a "egrep (completed|Sending)" to the end of my solution above would do.
what worries me about this solution is "the final fgrep finds all lines that contain one of the connection numbers" I don't want all lines that contain the connection numbers, i only want the first few lines found for a connection AFTER a 'DoBind" has been performed on the connection. I've added sample results.
veefu
Thanks for the python code. I'd +1 you again if I could. I ended up dusting off my PERL book and doing it that way. I've been meaning to learn python... and seeing how ugly the perl solution is does motivate me.
veefu
+1  A: 

You're going to want to look closely at a sed reference if you want it in one pass - you could certainly do it. Look into the sed commands that swap the hold and pattern buffers, and compare the two. You could write a multi-step rule that matches "cn=admin", and swaps it to the hold buffer, and then match the "DoBind" pattern when the hold buffer is not empty.

I can't remember the commands offhand, but it's not terribly complicated; you'll just need to look it up in the reference documentation.

Ben Collins
Thanks for the lead. I'm reading up on it now, but not sure how it's applicable. The "DoBind" and "cn=admin" line isn't the trouble-spot. The problem is passing the connection address, which changes, to the next search pattern. Are you saying I'll be extracting the address field from a matching line and appending it to the later pattern buffer?
veefu
ah, sorry. I misread a little. Yes, you could extract some field from a matching line and stick that into the hold buffer. Then, your other rule will try to match on the DoBind pattern, but only if the hold buffer isn't empty.
Ben Collins
I consider myself pretty versed in sed and shell scripting, and this is the kind of stuff I always have to go to the reference manual for. Multi-line processing with a line-based tool is just inherently trick, even though it's probably the best tool for the job short of a full-blown program written in some general-purpose language.If you can't get this to work in a sed script, you might fall back to bash, since it has built-in regex operators (or perl - shudder - if you're more comfortable with that).
Ben Collins
A: 

Well, I couldn't find a solution with sed alone. Here's my ugly perl solution:

open INFILE, $ARGV[0] or die "Couldn't open file $ARGV[0]";
while (<INFILE>) {
  if (/(.*DoBind on connection (0x[0-9a-f]*))/) {
    $potentialmatch = $1; $connid = $2;
    $currentline = <INFILE>;
    if ($currentline =~ /(.*Bind name:cn=OblixAppId.*)/) {
      print $potentialmatch . "\n" . $1 . "\n";
      $offset = tell INFILE;
      while($currentline = <INFILE>) {
        if ($currentline =~ /(.*Sending operation result.*to connection $connid.*)/) {
          print "$1\n";
          next;
        }
        if ($currentline =~ /(.*Operation.*on connection $connid completed.*)/) {
          print  "$1\n";
          seek INFILE, $offset, 0;
          last;
        }
      }
    }
  }
}
veefu
+2  A: 

As an intellectual challenge, I have come up with a solution using sed (as requested), but I would say that using some other technology (perl in my favorite) would be more easy to understand, and hence easier to support.

You have a couple of options where is comes to multi-line processing in sed:

  • you can use the hold space - which can be used to store all or part of the pattern space for subsequent processing, or
  • you can append further lines to the pattern space using commands like N.

    you can either use the hold space

Note: the example below uses GNU sed. It can additionally be made to work with Solaris sed by changing the multi-command syntax (';' replaced with ). I have used the GNU sed variation to make the script more compact.

The script below is commented, for the reader's benefit and mine.

sed -n '
# if we see the line "DoBind" then store the pattern in the hold space
/DoBind/ h

# if we see the line "cn=admin", append the pattern to the holdspace
# and branch to dobind
/cn=admin/{H;b dobind}

# if we see the pattern "Sending...." append the hold space to the
# pattern and  branch to doop
/Sending operation result/{G;b doop}

# branch to the end of the script
b

# we have just seen a cn=admin, ad the hold space contains the last
# two lines
:dobind

# swap hold space with pattern space
x

# print out the pattern space
p

# strip off everying that is not the connection identifier
s/^.*connection //
s/\n.*$//

# put it in the hold space
x

# branch to end of script.
b

# have just seen "Sending operation" and the current stored connection
#identifier has been appended to the pattern space
:doop

# does the connection id on both lines match? Yes do to gotop.
/connection \(0x[0-9a-f]*\).*\n\1$/ b gotop

# branch to end of script
b

# pattern contains two lines "Sending....", and the connection id.
:gotop

# delete the second line
s/\n.*$//

# read the next line and append it to the pattern space.
N

# print it out
p

# clear the pattern space, and put it into the hold space - hence
# clearing the hold space
s/^.*$//
x

'

Beano