tags:

views:

101

answers:

5

Basically, I have a file like this:

Url/Host:   www.example.com
Login:     user
Password:   password
Data_I_Dont_Need:    something_else

How can I use RegEx to separate the details to place them into variables?

Sorry if this is a terrible question, I can just never grasp RegEx. So another question would be, can you provide the RegEx, but kind of explain what each part of it is for?

A: 

Well, if you don't know about regex, simply change you file like this:

Host = www.example.com
Login = uer
Password = password

And use ConfigParser python module http://docs.python.org/library/configparser.html

mkotechno
Modifying the file isn't really an option, but thank you
Rob
ConfigParser supports `:` delimiter http://stackoverflow.com/questions/2845018/extracting-data-from-a-text-file-to-use-in-a-python-script/2845923#2845923
J.F. Sebastian
A: 

EDIT: Better Solution

for line in input: 
    key, val = re.search('(.*?):\s*(.*)', line).groups()
mikerobi
+1  A: 

You should put the entries in a dictionary, not in so many separate variables -- clearly, the keys you're using need NOT be acceptable as variable names (that slash in 'Url/Host' would be a killer!-), but they'll be just fine as string keys into a dictionary.

import re

there = re.compile(r'''(?x)      # verbose flag: allows comments & whitespace
                       ^         # anchor to the start
                       ([^:]+)   # group with 1+ non-colons, the key
                       :\s*      # colon, then arbitrary whitespace
                       (.*)      # group everything that follows
                       $         # anchor to the end
                    ''')

and then

 configdict = {}
 for aline in open('thefile.txt'):
   mo = there.match(aline)
   if not mo:
     print("Skipping invalid line %r" % aline)
     continue
   k, v = mo.groups()
   configdict[k] = v

the possibility of making RE patterns "verbose" (by starting them with (?x) or using re.VERBOSE as the second argument to re.compile) is very useful to allow you to clarify your REs with comments and nicely-aligning whitespace. I think it's sadly underused;-).

Alex Martelli
Nice answer and great explanation. I think I'd like potential whitespace on the value removed. I believe that could be done by adding \s* between the value group and the end-of-line anchor '$'?
extraneon
AttributeError: 'NoneType' object has no attribute 'group'
Rob
@Rob, you mean `groups`, not `group`. Yes, I forgot to add the `continue` obviously needed to **do** the skip, let me add it. BTW, your question doesn't mention that there can be lines that don't match this pattern, and what to do when such lines are found -- please edit your Q to add this crucial information!
Alex Martelli
@extraneon, if you want to remove trailing whitespace on the value, change the end of the RE's pattern to `(.*?)\s*$`. The `?` here is crucial as it tells the RE to do the star-match non-greedily: without it, it would still match the trailing whitespace as part of this group!
Alex Martelli
Sorry, didn't realize it matted. Edited it
Rob
+1  A: 

For a file as simple as this you don't really need regular expressions. String functions are probably easier to understand. This code:

def parse(data):
    parsed = {}    
    for line in data.split('\n'):
        if not line: continue # Blank line
        pair = line.split(':')
        parsed[pair[0].strip()] = pair[1].strip()
    return parsed

if __name__ == '__main__':
    test = """Url/Host:   www.example.com
    Login:     user
    Password:   password
"""
    print parse(test)

Will do the job, and results in:

{'Login': 'user', 'Password': 'password', 'Url/Host': 'www.example.com'}
snim2
A: 

ConfigParser module supports ':' delimiter.

import ConfigParser
from cStringIO import StringIO

class Parser(ConfigParser.RawConfigParser):
    def _read(self, fp, fpname):
        data = StringIO("[data]\n"+fp.read()) 
        return ConfigParser.RawConfigParser._read(self, data, fpname)

p = Parser()
p.read("file.txt")
print dict(p.items("data"))

Output:

{'login': 'user', 'password': 'password', 'url/host': 'www.example.com'}

Though a regex or manual parsing might be more appropriate in your case.

J.F. Sebastian