views:

156

answers:

4

I'm parsing a string that doesn't have a delimiter but does have specific indexes where fields start and stop. Here's my list comprehension to generate a list from the string:

field_breaks = [(0,2), (2,10), (10,13), (13, 21), (21, 32), (32, 43), (43, 51), (51, 54), (54, 55), (55, 57), (57, 61), (61, 63), (63, 113), (113, 163), (163, 213), (213, 238), (238, 240), (240, 250), (250, 300)]
s = '4100100297LICACTIVE  09-JUN-198131-DEC-2010P0         Y12490227WYVERN RESTAURANTS INC                            1351 HEALDSBURG AVE                                                                                 HEALDSBURG               CA95448     ROUND TABLE PIZZA                                 575 W COLLEGE AVE                                 STE 201                                           SANTA ROSA               CA95401               '
data = [s[x[0]:x[1]].strip() for x in field_breaks]

Any recommendation on how to improve this?

+6  A: 

You can cut your field_breaks list in half by doing:

field_breaks = [0, 2, 10, 13, 21, 32, 43, ..., 250, 300]
s = ...
data = [s[x[0]:x[1]].strip() for x in zip(field_breaks[:-1], field_breaks[1:])]
dan04
+1: Great idea to cut down on redundancy and risk of clerical errors. Combine this with Tomasz Wysocki's solution and it's perfect. Easy to read, too.
Tim Pietzcker
Thanks! I'm going to go with this one because I like the idea of reducing the side of field_breaks.
Swingley
+7  A: 

You can use tuple unpacking for cleaner code:

data = [s[a:b].strip() for a,b in field_breaks]
Tomasz Wysocki
+1, and this could be combined with dan04's idea as well (possibly using `pairwise` from the [`itertools` documentation](http://docs.python.org/library/itertools.html))
David Zaslavsky
A: 

Here is a way using map

data = map(s.__getslice__, *zip(*field_breaks))
gnibbler
+3  A: 

To be honest, I don't find the parse-by-column-number approach very readable, and I question its maintainability (off by one errors and the like). Though I'm sure the list comprehensions are very virtuous and efficient in this case, and the suggested zip-based solution has a nice functional tweak to it.

Instead, I'm going to throw softballs from out here in left field, since list comprehensions are supposed to be in part about making your code more declarative. For something completely different, consider the following approach based on the pyparsing module:

def Fixed(chars, width):
    return Word(chars, exact=width)

myDate = Combine(Fixed(nums,2) + Literal('-') + Fixed(alphas,3) + Literal('-')
                 + Fixed(nums,4))

fullRow = Fixed(nums,2) + Fixed(nums,8) + Fixed(alphas,3) + Fixed(alphas,8)
          + myDate + myDate + ...

data = fullRow.parseString(s)
# should be ['41', '00100297', 'LIC', 'ACTIVE  ', 
#            '09-JUN-1981', '31-DEC-2010', ...]

To make this even more declarative, you could name each of the fields as you come across them. I have no idea what the fields actually are, but something like:

someId = Fixed(nums,2)
someOtherId = Fixed(nums,8)
recordType = Fixed(alphas,3)
recordStatus = Fixed(alphas,8)
birthDate = myDate
issueDate = myDate
fullRow = someId + someOtherId + recordType + recordStatus
          + birthDate + issueDate + ...

Now an approach like this probably isn't going to break any land speed records. But, holy cow, wouldn't you find this easier to read and maintain?

Owen S.
Very nice - all I would add would be a parse action to convert mydate to a Python datatime during parsing, and some results names, so that the values would be easily accessible post-parsing and the dates would already be usable as datetimes. (Fixed is a nice little helper, too.)
Paul McGuire