ansaurus

Question

Extracting tokens where some are optional

Answer 1

+4 A:

This program returns three integers (days, hours, seconds) for each input:

import re
samples = ['tt-5d10h', 'tt-5d10h30m', 'tt-5d30m', 'tt-10h30m', 'tt-5d', 'tt-10h', 'tt-30m',]

def parse(text):
    match = re.match('tt-(?:(\d+)d)?(?:(\d+)h)?(?:(\d+)m)?', text)
    values = [int(x) for x in match.groups(0)]
    return values

for sample in samples:
    print parse(sample)

Output:

[5, 10, 0]
[5, 10, 30]
[5, 0, 30]
[0, 10, 30]
[5, 0, 0]
[0, 10, 0]
[0, 0, 30]

compie 2010-08-02 20:02:38

Faster :) ... +1

chryss 2010-08-02 20:04:46

Answer 2

+2 A:

>>> pattern = re.compile("tt-(\d+d)?(\d+h)?(\d+m)?")
>>> results = pattern.match("tt-5d10h")
>>> days, hours, minutes = results.groups()
>>> days, hours, minutes
('5d', '10h', None)

Dave Webb 2010-08-02 20:05:28

Answer 3

+1 A:

Similar to compie's answer, but making the end result nicer to deal with:

re.match('tt-(?:(?P<days>\d+)d)?(?:(?P<hours>\d+)h)?(?:(?P<minutes>\d+)m)?', text).groupdict()

Example:

>>> import re
>>> s = ['tt-5d10h', 'tt-5d10h30m', 'tt-5d30m', 'tt-10h30m', 'tt-5d', 'tt-10h', 'tt-30m']
>>> for text in s:
    print(re.match('tt-(?:(?P<days>\d+)d)?(?:(?P<hours>\d+)h)?(?:(?P<minutes>\d+)m)?', text).groupdict())

{'hours': '10', 'minutes': None, 'days': '5'}
{'hours': '10', 'minutes': '30', 'days': '5'}
{'hours': None, 'minutes': '30', 'days': '5'}
{'hours': '10', 'minutes': '30', 'days': None}
{'hours': None, 'minutes': None, 'days': '5'}
{'hours': '10', 'minutes': None, 'days': None}
{'hours': None, 'minutes': '30', 'days': None}

If you want to substitute 0 for the left-out tokens instead, just use groupdict(0) instead of groupdict().

JAB 2010-08-02 20:21:59

Answer 4

+1 A:

By partition:

inputstring="""tt-5d10h
tt-5d10h30m
tt-5d30m
tt-10h30m
tt-5d
tt-10h
tt-30m
"""
separators=('d','h','m')
result=[]
for text in (item.lstrip('t-') for item in inputstring.splitlines()):
    data=[]
    for sep in separators:
        d,found,text = text.partition(sep)
        if found: data.append(int(d.rstrip(sep)))
        else:
            data.append(0)
            text=d
    result.append(data)
# show input and result
for respairs in zip(inputstring.splitlines(),result): print(respairs)
""" Output:
('tt-5d10h', [5, 10, 0])
('tt-5d10h30m', [5, 10, 30])
('tt-5d30m', [5, 0, 30])
('tt-10h30m', [0, 10, 30])
('tt-5d', [5, 0, 0])
('tt-10h', [0, 10, 0])
('tt-30m', [0, 0, 30])
"""

Tony Veijalainen 2010-08-02 21:09:28

Answer 5

+1 A:

Here's a pyparsing approach to your problem:

tests = """tt-5d10h 
tt-5d10h30m 
tt-5d30m 
tt-10h30m 
tt-5d 
tt-10h 
tt-30m""".splitlines()

from pyparsing import Word,nums,Optional

integer = Word(nums).setParseAction(lambda t:int(t[0]))

timeFormat = "tt-" + (
                Optional(integer("days") + "d") +
                Optional(integer("hrs")  + "h") +
                Optional(integer("mins") + "m")
                )

def normalizeTime(tokens):
    return tuple(tokens[field] if field in tokens else 0 
                for field in "days hrs mins".split())

timeFormat.setParseAction(normalizeTime)

for test in tests:
    print "%-12s ->" % test, 
    print "%d %02d:%02d" % timeFormat.parseString(test)[0]

Prints:

tt-5d10h     -> 5 10:00
tt-5d10h30m  -> 5 10:30
tt-5d30m     -> 5 00:30
tt-10h30m    -> 0 10:30
tt-5d        -> 5 00:00
tt-10h       -> 0 10:00
tt-30m       -> 0 00:30

Or to preserve the named results:

def normalizeTime(tokens):
    for field in "days hrs mins".split():
        if field not in tokens:
            tokens[field] = 0

timeFormat.setParseAction(normalizeTime)

for test in tests:
    print "%-12s ->" % test, 
    print "%(days)d %(hrs)02d:%(mins)02d" % timeFormat.parseString(test)

Paul McGuire 2010-08-03 01:23:36

ansaurus

tags:

views:

answers:

Extracting tokens where some are optional

related questions