ansaurus

Question

Grep multi-layered iterable for strings that match (Python)

Answer 1

A:

To get the position use enumerate()

>>> data = [('foo', 'bar', 'frrr', 'baz'), ('foo/bar', 'baz/foo')]
>>> 
>>> for l1, v1 in enumerate(data):
...     for l2, v2 in enumerate(v1):
...             if 'f' in v2:
...                     print l1, l2, v2
... 
0 0 foo
1 0 foo/bar
1 1 baz/foo

In this example I am using a simple match 'foo' in bar yet you probably use regex for the job.

Obviously, enumerate() can provide support in more than 2 levels as in your edited post.

Tzury Bar Yochay 2009-10-18 13:26:50

Answer 2

+1 A:

Here is a grep that uses recursion to search the data structure.

Note that good data structures lead the way to elegant solutions. Bad data structures make you bend over backwards to accomodate. This feels to me like one of those cases where a bad data structure is obstructing rather than helping you.

Having a simple data structure with a more uniform structure (instead of using this grep) might be worth investigating.

#!/usr/bin/env python

data=['something', 
('Diff',
('diff', 'udiff'),
('*.diff', '*.patch'),
('text/x-diff', 'text/x-patch',['find','java deep','down'])),

('Delphi',
('delphi', 'pas', 'pascal', 'objectpascal'),
('*.pas',),
('text/x-pascal',['lets', 'put one here'], )),

('JavaScript+Mako',
('js+mako', 'javascript+mako'),
('application/x-javascript+mako',
'text/x-javascript+mako',
'text/javascript+mako')),
]

def grep(astr,data,prefix=[]):
    result=[]
    for idx,elt in enumerate(data):
        if isinstance(elt,basestring):
            if astr in elt:
                result.append(tuple(prefix+[idx]))
        else:
            result.extend(grep(astr,elt,prefix+[idx]))
    return result

def pick(data,idx):
    if idx:
        return pick(data[idx[0]],idx[1:])
    else:
        return data
idxs=grep('java',data)
print(idxs)
for idx in idxs:
    print('data[%s] = %s'%(idx,pick(data,idx)))

unutbu 2009-10-18 13:53:39

Why not isinstance(elt, basestring)?

liori 2009-10-18 14:13:31

Thanks, I did not know about basestring!

unutbu 2009-10-18 14:21:39

Answer 3

+3 A:

I'd split recursive enumeration from grepping:

def enumerate_recursive(iter, base=()):
    for index, item in enumerate(iter):
        if isinstance(item, basestring):
            yield (base + (index,)), item
        else:
            for pair in enumerate_recursive(item, (base + (index,))):
                yield pair

def grep_index(filt, iter):
    return (index for index, text in iter if filt in text)

This way you can do both non-recursive and recursive grepping:

l = list(grep_index('opt1', enumerate(sys.argv)))   # non-recursive
r = list(grep_index('diff', enumerate_recursive(your_data)))  # recursive

Also note that we're using iterators here, saving RAM for longer sequences if necessary.

Even more generic solution would be to give a callable instead of string to grep_index. But that might not be necessary for you.

liori 2009-10-18 14:06:27

I like your solution better than mine :)

unutbu 2009-10-18 15:31:19

yes, it's good, thanks.

skyl 2009-10-18 16:11:59

ansaurus

tags:

views:

answers:

Grep multi-layered iterable for strings that match (Python)

related questions