ansaurus

Question

Need Help Parsing File for This Pattern "Feb 06 2010 15:49:00.017 MCO"

Answer 1

A:

From your sample data it seems that you don't have to check for the presence of a three letter identifier following the date -- it's always there. Add a final three letters to the regex if that's not a valid assumption. Also, add more grouping as needed for regex groups to be useful to you. Anyway:

import re
dtre = re.compile(r'^(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) [0-9]{2} [0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3}')

[line for line in file if dtre.match(line)]

Wrap it in a with statement or whatever to open your file, then do any processing you need on the list this builds up.

Another possibility would be to use a generator expression instead of a list comprehension (replace the outer [ and ] with ( and ) to do so). This is useful if you're outputting results to somewhere as you go, the file is large and you don't need to have it all in memory for different purposes. Just be sure not to close the file before you consume the entire generator if you go with this approach!

Also, you could use datetime's built-in parsing facility:

import datetime

for line in file:
    try:
        # the line[:24] bit assumes you're always going to have three-digit
        # µs part
        dt = datetime.datetime.strptime(line[:24], '%b %d %Y %H:%M:%S.%f')
    except ValueError:
        # a ValueError means the beginning of the line isn't parseable as datetime
        continue
    # do something with the line; the datetime is already parsed and stored in dt

That's probably better if you're going to create the datetime.datetime object anyway.

Michał Marczyk 2010-03-02 18:11:04

The date will change all the time. The format will remain the same.

2010-03-02 18:14:55

Oh, I see. If you want to include lines with different `date` parts in your result set, I guess you do need a regex; will edit one in in a sec.

Michał Marczyk 2010-03-02 18:16:13

Well, there it is. I've also added a `datetime`-based approach which may be cleaner, though you'd have to spoil it a little if you needed to allow for variable-length µs parts (which is probably not a problem for you here, since you're dealing with a rigid logfile format).

Michał Marczyk 2010-03-02 18:37:10

BTW, look here: http://docs.python.org/library/datetime.html#strftime-behavior for docs on `datetime.datetime.strptime`.

Michał Marczyk 2010-03-02 18:38:31

I have this that traps the line, but I do not know how to get it to return the rest of the line.([a-zA-Z]{3}\s\d\d\s\d\d\d\d\s\d\d:\d\d\)

2010-03-02 18:55:29

Have you tried my code from the answer? The list comprehension in the top code snippet should get you a list of all the lines matching your specification. Entire lines, not just initial fragments matching the regex. In general, if you're matching a regex against a string, it doesn't alter the string in any way, so you can still use it later. (If you successfully match a regex against the string bound to the variable `line`, this doesn't break `line` in any way, so you can still just return it / append it to some list / print it out / whatever.)

Michał Marczyk 2010-03-02 19:54:59

Answer 2

+1 A:

seems like your date + 3 characters are always the first 5 fields (with space as delimiter). Just go through the file, and do a split on space to each line. Then get the first 5 fields

s=Split(strLineOfFile," ")
wscript.echo s(0),s(1),s(2),s(3),s(4)

No need regex

ghostdog74 2010-03-03 00:08:11

ansaurus

tags:

views:

answers:

Need Help Parsing File for This Pattern "Feb 06 2010 15:49:00.017 MCO"

related questions