ansaurus

Question

Finding multiple occurrences of a string within a string in Python

Answer 1

+6 A:

>>> import re
>>> text = "Allowed Hello Hollow"
>>> for m in re.finditer( 'll', text ):
...     print( 'll found', m.start(), m.end() )

ll found 1 3
ll found 10 12
ll found 16 18

Alternatively, if you don't want the overhead of RegularExpressions:

>>> text = "Allowed Hello Hollow"
>>> index = 0
>>> while index < len( text ):
...     index = text.find( 'll', index )
...     if index == -1:
...         break
...     print( 'll found at', index )
...     index += 2 # +2 because len('ll') == 2

ll found at  1
ll found at  10
ll found at  16

This works also for lists.

poke 2010-10-06 14:15:46

Is there no way to do it without using regular expressions?

A A 2010-10-06 14:16:25

Not that I have any problem, but just curious.

A A 2010-10-06 14:18:29

@poke: This is what I was looking for (wrt edit)

A A 2010-10-06 14:23:44

lists don't have `find`. But it works with `index`, you just need to `except ValueError` instead of testing for -1

aaronasterling 2010-10-06 14:33:58

@Aaron: I was referring to the basic idea, of course you have to amend it a bit for lists (for example `index += 1` instead).

poke 2010-10-06 15:07:28

now that you mention the whole `index += 2` thing, if you apply this to the string 'lllll', it will miss two out of four occurrences of 'll'. Best to stick with `index += 1` for strings too.

aaronasterling 2010-10-06 15:11:09

Usually I wouldn't want to look for overlapping strings. That doesn't happen with regex either..

poke 2010-10-06 15:28:56

Answer 2

A:

I think what you are looking for is string.count

"Allowed Hello Hollow".count('ll')
>>> 3

Hope this helps

inspectorG4dget 2010-10-06 14:20:32

I need the index.

A A 2010-10-06 14:22:18

Answer 3

+1 A:

>>> for n,c in enumerate(text):
...   try:
...     if c+text[n+1] == "ll": print n
...   except: pass
...
1
10
16

ghostdog74 2010-10-06 14:25:49

Answer 4

+2 A:

For your list example:

In [1]: x = ['ll','ok','ll']

In [2]: for idx, value in enumerate(x):
   ...:     if value == 'll':
   ...:         print idx, value       
0 ll
2 ll

If you wanted all the items in a list that contained 'll', you could also do that.

In [3]: x = ['Allowed','Hello','World','Hollow']

In [4]: for idx, value in enumerate(x):
   ...:     if 'll' in value:
   ...:         print idx, value
   ...:         
   ...:         
0 Allowed
1 Hello
3 Hollow

chauncey 2010-10-06 14:27:13

Nice. Thank you!

A A 2010-10-06 14:29:25

Answer 5

+1 A:

For the list example, use a comprehension:

>>> l = ['ll', 'xx', 'll']
>>> print [n for (n, e) in enumerate(l) if e == 'll']
[0, 2]

Similarly for strings:

>>> text = "Allowed Hello Hollow"
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 10, 16]

this will list adjacent runs of "ll', which may or may not be what you want:

>>> text = 'Alllowed Hello Holllow'
>>> print [n for n in xrange(len(text)) if text.find('ll', n) == n]
[1, 2, 11, 17, 18]

bstpierre 2010-10-06 15:27:19

Wow I like this. Thank you. This is perfect.

A A 2010-10-06 16:39:08

Answer 6

+1 A:

FWIW, here are a couple of non-RE alternatives that I think are neater than poke's solution.

The first uses str.index and checks for ValueError:

def findall(sub, string):
    """
    >>> text = "Allowed Hello Hollow"
    >>> tuple(findall('ll', text))
    (1, 10, 16)
    """
    index = 0 - len(sub)
    try:
        while True:
            index = string.index(sub, index + len(sub))
            yield index
    except ValueError:
        pass

The second tests uses str.find and checks for the sentinel of -1 by using iter:

def findall_iter(sub, string):
    """
    >>> text = "Allowed Hello Hollow"
    >>> tuple(findall_iter('ll', text))
    (1, 10, 16)
    """
    def next_index(length):
        index = 0 - length
        while True:
            index = string.find(sub, index + length)
            yield index
    return iter(next_index(len(sub)).next, -1)

To apply any of these functions to a list, tuple or other iterable of strings, you can use a higher-level function —one that takes a function as one of its arguments— like this one:

def findall_each(findall, sub, strings):
    """
    >>> texts = ("fail", "dolly the llama", "Hello", "Hollow", "not ok")
    >>> list(findall_each(findall, 'll', texts))
    [(), (2, 10), (2,), (2,), ()]
    >>> texts = ("parallellized", "illegally", "dillydallying", "hillbillies")
    >>> list(findall_each(findall_iter, 'll', texts))
    [(4, 7), (1, 6), (2, 7), (2, 6)]
    """
    return (tuple(findall(sub, string)) for string in strings)

intuited 2010-10-06 16:27:36

ansaurus

tags:

views:

answers:

Finding multiple occurrences of a string within a string in Python

related questions