ansaurus

Question

Extracting data from a text file to use in a python script?

Answer 1

A:

Well, if you don't know about regex, simply change you file like this:

Host = www.example.com
Login = uer
Password = password

And use ConfigParser python module http://docs.python.org/library/configparser.html

mkotechno 2010-05-16 18:57:59

Modifying the file isn't really an option, but thank you

Rob 2010-05-16 19:00:59

ConfigParser supports `:` delimiter http://stackoverflow.com/questions/2845018/extracting-data-from-a-text-file-to-use-in-a-python-script/2845923#2845923

J.F. Sebastian 2010-05-16 23:29:23

Answer 2

A:

EDIT: Better Solution

for line in input: 
    key, val = re.search('(.*?):\s*(.*)', line).groups()

mikerobi 2010-05-16 19:03:01

Answer 3

+1 A:

You should put the entries in a dictionary, not in so many separate variables -- clearly, the keys you're using need NOT be acceptable as variable names (that slash in 'Url/Host' would be a killer!-), but they'll be just fine as string keys into a dictionary.

import re

there = re.compile(r'''(?x)      # verbose flag: allows comments & whitespace
                       ^         # anchor to the start
                       ([^:]+)   # group with 1+ non-colons, the key
                       :\s*      # colon, then arbitrary whitespace
                       (.*)      # group everything that follows
                       $         # anchor to the end
                    ''')

and then

 configdict = {}
 for aline in open('thefile.txt'):
   mo = there.match(aline)
   if not mo:
     print("Skipping invalid line %r" % aline)
     continue
   k, v = mo.groups()
   configdict[k] = v

the possibility of making RE patterns "verbose" (by starting them with (?x) or using re.VERBOSE as the second argument to re.compile) is very useful to allow you to clarify your REs with comments and nicely-aligning whitespace. I think it's sadly underused;-).

Alex Martelli 2010-05-16 19:06:32

Nice answer and great explanation. I think I'd like potential whitespace on the value removed. I believe that could be done by adding \s* between the value group and the end-of-line anchor '$'?

extraneon 2010-05-16 19:09:23

AttributeError: 'NoneType' object has no attribute 'group'

Rob 2010-05-16 20:58:42

@Rob, you mean `groups`, not `group`. Yes, I forgot to add the `continue` obviously needed to **do** the skip, let me add it. BTW, your question doesn't mention that there can be lines that don't match this pattern, and what to do when such lines are found -- please edit your Q to add this crucial information!

Alex Martelli 2010-05-17 00:04:52

@extraneon, if you want to remove trailing whitespace on the value, change the end of the RE's pattern to `(.*?)\s*$`. The `?` here is crucial as it tells the RE to do the star-match non-greedily: without it, it would still match the trailing whitespace as part of this group!

Alex Martelli 2010-05-17 00:06:42

Sorry, didn't realize it matted. Edited it

Rob 2010-05-17 02:51:23

Answer 4

+1 A:

For a file as simple as this you don't really need regular expressions. String functions are probably easier to understand. This code:

def parse(data):
    parsed = {}    
    for line in data.split('\n'):
        if not line: continue # Blank line
        pair = line.split(':')
        parsed[pair[0].strip()] = pair[1].strip()
    return parsed

if __name__ == '__main__':
    test = """Url/Host:   www.example.com
    Login:     user
    Password:   password
"""
    print parse(test)

Will do the job, and results in:

{'Login': 'user', 'Password': 'password', 'Url/Host': 'www.example.com'}

snim2 2010-05-16 19:56:01

Answer 5

A:

ConfigParser module supports ':' delimiter.

import ConfigParser
from cStringIO import StringIO

class Parser(ConfigParser.RawConfigParser):
    def _read(self, fp, fpname):
        data = StringIO("[data]\n"+fp.read()) 
        return ConfigParser.RawConfigParser._read(self, data, fpname)

p = Parser()
p.read("file.txt")
print dict(p.items("data"))

Output:

{'login': 'user', 'password': 'password', 'url/host': 'www.example.com'}

Though a regex or manual parsing might be more appropriate in your case.

J.F. Sebastian 2010-05-16 23:28:52

ansaurus

tags:

views:

answers:

Extracting data from a text file to use in a python script?

related questions