ansaurus

Question

How to filter data from a file using Python?

Answer 1

+2 A:

You can extract the strings of interest (and some more text) using for example the popular beautifulsoup package. Then, you'll need some string manipulation (or maybe regular expressions) to separate the exact part of interest, but that depends on exactly what are the rules you want to apply -- i.e., is it always the .log suffix you want to drop from the filename, is it always a space that separates the date from the time, and so forth. If you specify the rules precisely it will not be hard to implement them (without a precise specification, however, it would all be a big mess of guesses;-).

Alex Martelli 2009-12-12 21:16:11

Answer 2

A:

Try Beautifull Soup, a parser for HTML. You'll get a structured document out of there and could select the first and second td contents.

It may be overkill in this instance, but especially if your HTML is from the outside and can change the maintenance guy will thank you for choosing a readable solution.

extraneon 2009-12-12 21:23:47

Answer 3

+5 A:

It's quite easy with BeautifulSoup:

html = '''<tr><td valign="top"><img src="/icons/unknown.gif" alt="[   ]">software_0.1-0.log</td><td align="right">17-Nov-2009 13:46  </td><td align="right">186K</td></tr>'''

import BeautifulSoup
soup = BeautifulSoup.BeautifulSoup(html)
print soup.td.next.next
print soup.td.nextSibling.next

Output:

software_0.1-0.log
17-Nov-2009 13:46

Mark Byers 2009-12-12 21:29:41

Answer 4

A:

you requirement seems simple, so here's the non BeautifulSoup way, just pure string manipulation

s="""<tr><td valign="top"><img src="/icons/unknown.gif" alt="[   ]">software_0.1-0.log</td><td align="right">17-Nov-2009 13:46  </td><td align="right">186K</td></tr>"""

string=s.split(">")
for i in string:
    try:
        e=i.index("<")
    except: pass
    else:
        print i[:e]

Now you can use i[:e] to find "software" and the date part

2009-12-13 06:06:45

While this is technically true, it is still better to use Beautiful Soup because that will pay you dividends in the future when you have to do more complex HTML manipulations.

Michael Dillon 2009-12-13 13:35:04

until that time when things are more complex, there's no need to use BeautifulSoup just for this case

2009-12-13 23:48:13

ansaurus

tags:

views:

answers:

How to filter data from a file using Python?

related questions