views:

199

answers:

3

Say that we have a multilayered iterable with some strings at the "final" level, yes strings are iterable, but I think that you get my meaning:

['something', 
('Diff',
('diff', 'udiff'),
('*.diff', '*.patch'),
('text/x-diff', 'text/x-patch')),

('Delphi',
('delphi', 'pas', 'pascal', 'objectpascal'),
('*.pas',),
('text/x-pascal',['lets', 'put one here'], )),

('JavaScript+Mako',
('js+mako', 'javascript+mako'),
('application/x-javascript+mako',
'text/x-javascript+mako',
'text/javascript+mako')),
...
]

Is there any convenient way that I could implement a search that would give me the indices of the matching strings? I would like something that would act something like this (where the above list is data):

>>> grep('javascript', data)

and it would return [ (2,1,1), (2,2,0), (2,2,1), (2,2,2) ] perhaps. Maybe I'm missing a comparable solution that returns nothing of the sort but can help me find some strings within a multi-layered list of iterables of iterables of .... strings.

I wrote a little bit but it was seeming juvenile and inelegant so I thought I would ask here. I guess that I could just keep nesting the exception the way I started here to the number of levels that the function would then support, but I was hoping to get something neat, abstract, pythonic.

import re

def rgrep(s, data):
    ''' given a iterable of strings or an iterable of iterables of strings,

    returns the index/indices of strings that contain the search string.

    Args::

        s - the string that you are searching for
        data - the iterable of strings or iterable of iterables of strings
    '''


    results = []
    expr = re.compile(s)
    for item in data:
        try:
            match = expr.search(item)
            if match != None:
                results.append( data.index(item) )

        except TypeError:
            for t in item:
                try:
                    m = expr.search(t)
                    if m != None:
                        results.append( (list.index(item), item.index(t)) )

                except TypeError:
                    ''' you can only go 2 deep! '''
                    pass

    return results
A: 

To get the position use enumerate()

>>> data = [('foo', 'bar', 'frrr', 'baz'), ('foo/bar', 'baz/foo')]
>>> 
>>> for l1, v1 in enumerate(data):
...     for l2, v2 in enumerate(v1):
...             if 'f' in v2:
...                     print l1, l2, v2
... 
0 0 foo
1 0 foo/bar
1 1 baz/foo

In this example I am using a simple match 'foo' in bar yet you probably use regex for the job.

Obviously, enumerate() can provide support in more than 2 levels as in your edited post.

Tzury Bar Yochay
+1  A: 

Here is a grep that uses recursion to search the data structure.

Note that good data structures lead the way to elegant solutions. Bad data structures make you bend over backwards to accomodate. This feels to me like one of those cases where a bad data structure is obstructing rather than helping you.

Having a simple data structure with a more uniform structure (instead of using this grep) might be worth investigating.

#!/usr/bin/env python

data=['something', 
('Diff',
('diff', 'udiff'),
('*.diff', '*.patch'),
('text/x-diff', 'text/x-patch',['find','java deep','down'])),

('Delphi',
('delphi', 'pas', 'pascal', 'objectpascal'),
('*.pas',),
('text/x-pascal',['lets', 'put one here'], )),

('JavaScript+Mako',
('js+mako', 'javascript+mako'),
('application/x-javascript+mako',
'text/x-javascript+mako',
'text/javascript+mako')),
]

def grep(astr,data,prefix=[]):
    result=[]
    for idx,elt in enumerate(data):
        if isinstance(elt,basestring):
            if astr in elt:
                result.append(tuple(prefix+[idx]))
        else:
            result.extend(grep(astr,elt,prefix+[idx]))
    return result

def pick(data,idx):
    if idx:
        return pick(data[idx[0]],idx[1:])
    else:
        return data
idxs=grep('java',data)
print(idxs)
for idx in idxs:
    print('data[%s] = %s'%(idx,pick(data,idx)))
unutbu
Why not isinstance(elt, basestring)?
liori
Thanks, I did not know about basestring!
unutbu
+3  A: 

I'd split recursive enumeration from grepping:

def enumerate_recursive(iter, base=()):
    for index, item in enumerate(iter):
        if isinstance(item, basestring):
            yield (base + (index,)), item
        else:
            for pair in enumerate_recursive(item, (base + (index,))):
                yield pair

def grep_index(filt, iter):
    return (index for index, text in iter if filt in text)

This way you can do both non-recursive and recursive grepping:

l = list(grep_index('opt1', enumerate(sys.argv)))   # non-recursive
r = list(grep_index('diff', enumerate_recursive(your_data)))  # recursive

Also note that we're using iterators here, saving RAM for longer sequences if necessary.

Even more generic solution would be to give a callable instead of string to grep_index. But that might not be necessary for you.

liori
I like your solution better than mine :)
unutbu
yes, it's good, thanks.
skyl