ansaurus

Question

Find the nth occurrence of substring in a string

Answer 1

+1 A:

I'd probably do something like this, using the find function that takes an index parameter:

def find_nth(s, x, n):
    i = -1
    for _ in range(n):
        i = s.find(x, i + len(x))
        if i == -1:
            break
    return i

print find_nth('bananabanana', 'an', 3)

It's not particularly Pythonic I guess, but it's simple. You could do it using recursion instead:

def find_nth(s, x, n, i = 0):
    i = s.find(x, i)
    if n == 1 or i == -1:
        return i 
    else:
        return find_nth(s, x, n - 1, i + len(x))

print find_nth('bananabanana', 'an', 3)

It's a functional way to solve it, but I don't know if that makes it more Pythonic.

Mark Byers 2009-12-10 21:14:24

`for _ in xrange(n):` can be used instead of `while n: ... n-=1`

J.F. Sebastian 2009-12-10 21:18:39

@J.F. Sebastian: Yeah, I guess that's a little more Pythonic. I'll update.

Mark Byers 2009-12-10 21:22:30

BTW: xrange is no longer needed in Python 3: http://diveintopython3.org/porting-code-to-python-3-with-2to3.html#xrange

Mark Byers 2009-12-10 21:24:44

`return find_nth(s, x, n - 1, i + 1)` should be `return find_nth(s, x, n - 1, i + len(x))`. Not a big deal, but saves some computation time.

Dan Loewenherz 2009-12-10 21:30:03

@dlo: Actually that can give different results in some cases: find_nth('aaaa','aa',2). Mine gives 1, yours gives 2. I guess yours is actually what the poster wants. I'll update my code. Thanks for the comment.

Mark Byers 2009-12-10 21:40:43

Answer 2

+7 A:

Mark's iterative approach would be the usual way, I think.

Here's an alternative with string-splitting, which can often be useful for finding-related processes:

def findnth(haystack, needle, n):
    parts= haystack.split(needle, n+1)
    if len(parts)<=n+1:
        return -1
    return len(haystack)-len(parts[-1])-len(needle)

And here's a quick (and somewhat dirty, in that you have to choose some chaff that can't match the needle) one-liner:

'foo bar bar bar'.replace('bar', 'XXX', 1).find('bar')

bobince 2009-12-10 21:26:39

+1, excellent idea!

Alex Martelli 2009-12-10 21:43:02

The first suggestion is going to be very inefficient for large strings when the match you're interested is near the beginning. It always looks at the whole string. It's clever but I wouldn't recommend this to someone who is new to Python and just wants to learn a good way to do it.

Mark Byers 2009-12-10 22:04:06

Thanks, I like your one liner. I don't think it's the most instantly readable thing in the world, but it's not much worse then most others below

prestomation 2009-12-11 03:58:00

+1 for the one-liner, this should help me right now. I had been thinking of doing the equivalent of `.rfind('XXX')`, but that would fall apart if `'XXX'` appears later in the input anyway.

Nikhil Chelliah 2010-07-07 04:17:56

Answer 3

+5 A:

Understanding that regex is not always the best solution, I'd probably use one here:

>>> import re
>>> s = "ababdfegtduab"
>>> [m.start() for m in re.finditer(r"ab",s)]
[0, 2, 11]
>>> [m.start() for m in re.finditer(r"ab",s)][2] #index 2 is third occurrence 
11

Mark Peters 2009-12-10 21:36:42

The risk here of course is that the string to search for will contain special characters that will cause the regex to do something you didn't want. Using re.escape should solve this.

Mark Byers 2009-12-10 21:43:41

This is clever, but is it really Pythonic? Seems like overkill for just finding the nth occurrence of a substring, and it's not exactly easy to read. Also, like you say, you have to import all of re for this

tgamblin 2009-12-10 21:51:21

Answer 4

+1 A:

Here is another approach using re.finditer.
The difference is that this only looks into the haystack as far as necessary

from re import finditer
from itertools import dropwhile
needle='an'
haystack='bananabanana'
n=2
next(dropwhile(lambda x: x[0]<n, enumerate(re.finditer(needle,haystack))))[1].start()

gnibbler 2009-12-10 21:45:18

Answer 5

+5 A:

Here's a more Pythonic version of the straightforward iterative solution:

def find_nth(haystack, needle, n):
    start = haystack.find(needle)
    while start >= 0 and n > 1:
        start = haystack.find(needle, start+len(needle))
        n -= 1
    return start

Example:

>>> find_nth("foofoofoofoo", "foofoo", 2)
6

If you want to find the nth overlapping occurrence of needle, you can increment by 1 instead of len(needle), like this:

def find_nth_overlapping(haystack, needle, n):
    start = haystack.find(needle)
    while start >= 0 and n > 1:
        start = haystack.find(needle, start+1)
        n -= 1
    return start

Example:

>>> find_nth_overlapping("foofoofoofoo", "foofoo", 2)
3

This is easier to read than Mark's version, and it doesn't require the extra memory of the splitting version or importing regular expression module. It also adheres to a few of the rules in the Zen of python, unlike the various re approaches:

Simple is better than complex.
Flat is better than nested.
Readability counts.

tgamblin 2009-12-10 21:45:22

Answer 6

+1 A:

>>> s="abcdefabcdefababcdef"
>>> j=0
>>> for n,i in enumerate(s):
...   if s[n:n+2] =="ab":
...     print n,i
...     j=j+1
...     if j==2: print "2nd occurence at index position: ",n
...
0 a
6 a
2nd occurence at index position:  6
12 a
14 a

2009-12-11 00:22:29

Answer 7

A:

Here's another re + itertools version that should work when searching for either a str or a RegexpObject. I will freely admit that this is likely over-engineered, but for some reason it entertained me.

import itertools
import re

def find_nth(haystack, needle, n = 1):
    """
    Find the starting index of the nth occurrence of ``needle`` in \
    ``haystack``.

    If ``needle`` is a ``str``, this will perform an exact substring
    match; if it is a ``RegexpObject``, this will perform a regex
    search.

    If ``needle`` doesn't appear in ``haystack``, return ``-1``. If
    ``needle`` doesn't appear in ``haystack`` ``n`` times,
    return ``-1``.

    Arguments
    ---------
    * ``needle`` the substring (or a ``RegexpObject``) to find
    * ``haystack`` is a ``str``
    * an ``int`` indicating which occurrence to find; defaults to ``1``

    >>> find_nth("foo", "o", 1)
    1
    >>> find_nth("foo", "o", 2)
    2
    >>> find_nth("foo", "o", 3)
    -1
    >>> find_nth("foo", "b")
    -1
    >>> import re
    >>> either_o = re.compile("[oO]")
    >>> find_nth("foo", either_o, 1)
    1
    >>> find_nth("FOO", either_o, 1)
    1
    """
    if (hasattr(needle, 'finditer')):
        matches = needle.finditer(haystack)
    else:
        matches = re.finditer(re.escape(needle), haystack)
    start_here = itertools.dropwhile(lambda x: x[0] < n, enumerate(matches, 1))
    try:
        return next(start_here)[1].start()
    except StopIteration:
        return -1

Hank Gay 2009-12-11 15:06:23

ansaurus

tags:

views:

answers:

Find the nth occurrence of substring in a string

related questions