tags:

views:

113

answers:

6

Hi there.

How do I find multiple occurrences of a string within a string in Python? Consider this:

>>> text = "Allowed Hello Hollow"
>>> text.find("ll")
1
>>> 

So the first occurrence of ll is at 1 as expected. How do I find the next occurrence of it?

Same question is valid for a list. Consider:

>>> x = ['ll', 'ok', 'll']

How do I find all the ll with their indexes?

+6  A: 
>>> import re
>>> text = "Allowed Hello Hollow"
>>> for m in re.finditer( 'll', text ):
...     print( 'll found', m.start(), m.end() )

ll found 1 3
ll found 10 12
ll found 16 18

Alternatively, if you don't want the overhead of RegularExpressions:

>>> text = "Allowed Hello Hollow"
>>> index = 0
>>> while index < len( text ):
...     index = text.find( 'll', index )
...     if index == -1:
...         break
...     print( 'll found at', index )
...     index += 2 # +2 because len('ll') == 2

ll found at  1
ll found at  10
ll found at  16

This works also for lists.

poke
Is there no way to do it without using regular expressions?
A A
Not that I have any problem, but just curious.
A A
@poke: This is what I was looking for (wrt edit)
A A
lists don't have `find`. But it works with `index`, you just need to `except ValueError` instead of testing for -1
aaronasterling
@Aaron: I was referring to the basic idea, of course you have to amend it a bit for lists (for example `index += 1` instead).
poke
now that you mention the whole `index += 2` thing, if you apply this to the string 'lllll', it will miss two out of four occurrences of 'll'. Best to stick with `index += 1` for strings too.
aaronasterling
Usually I wouldn't want to look for overlapping strings. That doesn't happen with regex either..
poke
A: 

I think what you are looking for is string.count

"Allowed Hello Hollow".count('ll')
>>> 3

Hope this helps

inspectorG4dget
I need the index.
A A
+1  A: 
>>> for n,c in enumerate(text):
...   try:
...     if c+text[n+1] == "ll": print n
...   except: pass
...
1
10
16
ghostdog74
+2  A: 

For your list example:

In [1]: x = ['ll','ok','ll']

In [2]: for idx, value in enumerate(x):
   ...:     if value == 'll':
   ...:         print idx, value       
0 ll
2 ll

If you wanted all the items in a list that contained 'll', you could also do that.

In [3]: x = ['Allowed','Hello','World','Hollow']

In [4]: for idx, value in enumerate(x):
   ...:     if 'll' in value:
   ...:         print idx, value
   ...:         
   ...:         
0 Allowed
1 Hello
3 Hollow
chauncey
Nice. Thank you!
A A
+1  A: 

For the list example, use a comprehension:

>>> l = ['ll', 'xx', 'll']
>>> print [n for (n, e) in enumerate(l) if e == 'll']
[0, 2]

Similarly for strings:

>>> text = "Allowed Hello Hollow"
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 10, 16]

this will list adjacent runs of "ll', which may or may not be what you want:

>>> text = 'Alllowed Hello Holllow'
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 2, 11, 17, 18]
bstpierre
Wow I like this. Thank you. This is perfect.
A A
+1  A: 

FWIW, here are a couple of non-RE alternatives that I think are neater than poke's solution.

The first uses str.index and checks for ValueError:

def findall(sub, string):
    """
    >>> text = "Allowed Hello Hollow"
    >>> tuple(findall('ll', text))
    (1, 10, 16)
    """
    index = 0 - len(sub)
    try:
        while True:
            index = string.index(sub, index + len(sub))
            yield index
    except ValueError:
        pass

The second tests uses str.find and checks for the sentinel of -1 by using iter:

def findall_iter(sub, string):
    """
    >>> text = "Allowed Hello Hollow"
    >>> tuple(findall_iter('ll', text))
    (1, 10, 16)
    """
    def next_index(length):
        index = 0 - length
        while True:
            index = string.find(sub, index + length)
            yield index
    return iter(next_index(len(sub)).next, -1)

To apply any of these functions to a list, tuple or other iterable of strings, you can use a higher-level function —one that takes a function as one of its arguments— like this one:

def findall_each(findall, sub, strings):
    """
    >>> texts = ("fail", "dolly the llama", "Hello", "Hollow", "not ok")
    >>> list(findall_each(findall, 'll', texts))
    [(), (2, 10), (2,), (2,), ()]
    >>> texts = ("parallellized", "illegally", "dillydallying", "hillbillies")
    >>> list(findall_each(findall_iter, 'll', texts))
    [(4, 7), (1, 6), (2, 7), (2, 6)]
    """
    return (tuple(findall(sub, string)) for string in strings)
intuited