views:

295

answers:

2

Here's a fragment of code I'm prototyping that should, by all accounts, never see the light of day. I'll refactor it and clean it up before I merge it into my project.

However, it seems to be working and I happened to be listening to Arlo Guthrie when I was working on it.

#!/usr/bin/env python
import re

expr = re.compile(r'\[[0-9][-0-9,[]*\]')
def range2list(s):
    '''Given [x-y,a,b-c] return: range(x,y) + [a] + range(b,c)
       Handle decrements and zero-filling if necessary.

    '''
    assert s.startswith('[') and s.endswith(']') and len(s) > 2
    results = []
    r = s[1:-1]  # extract from enclosing brackets
    for i in r.split(','):  # each p
        if '-' not in i:
            results.append(i)
            continue
        # Else: (it's a range
        t = i.split('-')
        if len(t) != 2:   # punt on degenerate expressions
            results.append(i)
            continue
        # Else:
        if len(t[0]) > 1 and t[0].startswith('0'):
            fmt = "%%0%sd" % len(t[0])  ## Handle zero fill
        else:
            fmt = "%s"
        try:
            l, u = int(t[0]), int(t[1])
        except ValueError:  # punt on stuff that can't be converted
            results.append(i) # remember i? There's a song about i.
            continue
        if l > u:
            step=-1
        else:
            step=1
        results.extend([fmt % x for x in range(l,u,step)])
    return results

... and a test suite for it:

if __name__ == '__main__':
    import sys
    testcases = [ '[0-5]', '[1]', '[1,2,3]', '[1-3,01-3,9,9-7]',
                  '[01-20]', '[020-1]', '[a,b,c,9-]' ]
    for i in testcases:
        print 
        print  'range2list(%s)' % i
        print "\t" + ' '.join(range2list(i))

... which produces:

range2list([0-5])
    0:1:2:3:4

range2list([1])
        1

range2list([1,2,3])
        1:2:3

range2list([1-3,01-3,9,9-7])
        1:2:01:02:9:9:8

range2list([01-20])
        01:02:03:04:05:06:07:08:09:10:11:12:13:14:15:16:17:18:19

range2list([020-1])
        020:019:018:017:016:015:014:013:012:011:010:009:008:007:006:005:004:003:002

range2list([a,b,c,9-])
        a:b:c:9-

I really don't like the convoluted mess in there (especially at the point where I'm writing the comment "remember i, there's a song about i."

When I get this cleaned up I'll merge it into a function which expands hostname range patterns (ww[020-040,091,099].sfarm.mycorp.com ... and so on). (Actually the compile regexp shown here is part of that other function, it extracts the [...] expressions from a string for expansion).

So, my questions:

  • How can I clean up this mess?
  • What's the most interesting, obscure, amusing, etc. musical reference you've seen in a source code comment?
  • Has anyone written a parser/expander out there that already does something like this? In Python? Would anyone else ever use such a thing? Is it worth making available separately?
  • What alternative syntaxes would make sense? '{0:9,12,23,090:099}'? .. instead of -?
+4  A: 

If you could switch your current a-b syntax (which seems likely to get hopelessly confused by negative numbers!) to a:b, then Python's slice syntax would do the parsing for you -- you'd end up (e.g. through a fake class with an indexing method) with a tuple including slices and scalars:

>>> class x(object):
...   def __getitem__(self, x): return x
... 
>>> x()[2, 3:6, 4]
(2, slice(3, 6, None), 4)

and you could just process that tuple sequentially to produce the results you want (by successively appending to, or appropriately extending, a list that starts as []).

Alex Martelli
I like : better. The users already have a similar tools which already uses - (and the square brackets, for that matter). I don't anticipate every needing negative numbers. Never seen anything like that in a hostname.
Jim Dennis
+1  A: 

I posted a parser for this format here.

Paul McGuire
Using pyparsing is nice ... I'll have to learn it some time. However, I'd prefer to shy away from 3rd party module dependencies and keep this capable of running on Python 2.4.x (current defaults packaged with RHEL4 and RHEL5 and FreeBSD).Also this one doesn't handle the optional zero-fill (which is likely to be a requirement for my needs). I'm not sure if I want to remove the dupes at this point and I KNOW I don't want to sort the results.The biggest problem though is that I must never raise an exception here. I should pass anything I can't parse back through unscathed.
Jim Dennis