views:

85

answers:

6

How would you parse the ['i386', 'x86_64'] out of a string like '-foo 23 -bar -arch ppc -arch i386 -isysroot / -fno-strict-aliasing -fPIC'?

>>> my_arch_parse_function('-foo 23 -bar -arch i386 -arch x86_64 -isysroot /  -fno-strict-aliasing -fPIC')
>>> ['i386', 'x86_64']

Can this be done using regex, or only using modules like PyParsing, or manually splitting and iterating over the splits?

Assumption: -arch VAL are grouped together.

+4  A: 

Why not use the argument parsing modules? optparse in Python 2.6 (and 3.1) and argparse in Python 2.7 (and 3.2).

EDIT: On second thought, that's not as simple as it sounds, because you may have to define all the arguments you are likely to see (not sure if these modules have a catchall mechanism). I'll leave the answer here because might work, but take it with a grain of salt.

Marcelo Cantos
Yea, the idea is smart (never thought of it before), but not simple. :-)
Sridhar Ratnakumar
You can't do a catchall in `optparse`, unfortunately: http://stackoverflow.com/questions/1885161/how-can-i-get-optparses-optionparser-to-ignore-invalid-arguments
katrielalex
+3  A: 

Regex: (?<=-arch )[^ ]+

>>> re.findall( r"(?<=-arch )([^ ]+)", r"'-foo 23 -bar -arch ppc -arch i386 -isysroot -fno-strict-aliasing -fPIC'" )
['ppc', 'i386']

Arbitrary whitespace

>>> foo = re.compile( r"(?<=-arch)\s+[^\s]+" )
>>> [ str.strip() for str in re.findall( foo, r"'-foo 23 -bar -arch ppc -arch i386 -isysroot -fno-strict-aliasing -fPIC'" ) ]
['ppc', 'i386']

P.S. There's no x86_64 in that string, and are you trying to differentiate between -arch ppc and -arch i386?

katrielalex
That can't get any simpler.
Sridhar Ratnakumar
I noticed that this regex does not handle extra whitespace after `-arch`. `re.findall( r"(-arch\s+)([^ ]+)", [...]` works, but it returns a list of tuples.
Sridhar Ratnakumar
If you don't know how much whitespace might be after `-arch`, you'll have to strip it from the string after matching -- lookbehinds must be fixed-width. See above.
katrielalex
Final verdict: `r"(?<=-arch)\s+([^\s]+)"` which will not even require manual stripping.
Sridhar Ratnakumar
That doesn't work. Specifically, try it on `"-arch i386"` (two spaces) -- it'll capture the whitespace minus one space into the first group, but the regex still matches the whole expression. Also, it requires that the first character after `-args` is a space, what if it's a tab?
katrielalex
@katrielalex: what doesn't work? the 'arbitrary whitespace' example in your answer?
Sridhar Ratnakumar
Hmm, I'm not sure what I was thinking. The point I was trying to make was that the regex you posted above doesn't work, but it does! I think you might have edited it before I posted the message =p?
katrielalex
Apologies, hehe.
katrielalex
A: 

Answering my own question, I found a regex via this tool:

>>> regex = re.compile("(?P<key>\-arch\s?)(?P<value>[^\s]+?)\s|$")
>>> r = regex.search(string)
>>> r
<_sre.SRE_Match object at 0x8aa59232ae397b10>
>>> regex.match(string)
None

# List the groups found
>>> r.groups()
(u'-arch ', u'ppc')

# List the named dictionary objects found
>>> r.groupdict()
{u'key': u'-arch ', u'value': u'ppc'}

# Run findall
>>> regex.findall(string)
[(u'-arch ', u'ppc'), (u'-arch ', u'i386'), (u'', u'')]
Sridhar Ratnakumar
A: 

Try this if you want regex:

arch_regex = re.compile('\s+('+'|'.join(arch_list)+')\s+',re.I)
results = arch_regex.findall(arg_string)

A little too much regex for my taste, but it works. For future reference, it is better to use optparse for command line option parsing.

krs1
Ugh. And if you know all the arguments that might arrive, you should use argparse in the first place!
katrielalex
A: 

Hand-made with Python2.6 I am sure that you or a library can do a better job.

inp = '-foo 23 -bar -arch ppc -arch i386 -isysroot / -fno-strict-aliasing -fPIC'.split()
dct = {}
noneSet = set([None])

flagName = None
values = []
for param in inp:
    if param.startswith('-'):
        flagName = param
        if flagName not in dct:
            dct[flagName] = set()
        dct[flagName].add(None)
        continue
    # Else found a value
    dct[flagName].add(param)

print(dct)

result = sorted(dct['-arch'] - noneSet)
print(result)

>>> ================================ RESTART ================================
>>> 
{'-arch': set(['ppc', 'i386', None]), '-isysroot': set([None, '/']), '-fno-strict-aliasing': set([None]), '-fPIC': set([None]), '-foo': set([None, '23']), '-bar': set([None])}
['i386', 'ppc']
>>> 
Hamish Grubijan
+2  A: 

Would you consider a non-regex solution? Simpler:

>>> def my_arch_parse_function(s):
...     args = s.split()
...     idxs = (i+1 for i,v in enumerate(args) if v == '-arch')
...     return [args[i] for i in idxs]
...     
... 
>>> s='-foo 23 -bar -arch ppc -arch i386 -isysroot / -fno-strict-aliasing -fPIC'
>>> my_arch_parse_function(s)
['ppc', 'i386']
Muhammad Alkarouri