views:

112

answers:

7

How do I find a string between two substrings ('123STRINGabc' -> 'STRING')?

My current method is like this:

>>> start = 'asdf=5;'
>>> end = '123jasd'
>>> s = 'asdf=5;iwantthis123jasd'
>>> print((s.split(start))[1].split(end)[0])
iwantthis

However, this seems very inefficient and un-pythonic. What is a better way to do something like this?

Forgot to mention: The string might not start and end with start and end. They may have more characters before and after.

+2  A: 
s[len(start):-len(end)]
Tim McNamara
This is very nice, assuming start and end are always at the start and end of the string. Otherwise, I would probably use a regex.
jeremiahd
I went the most Pythonic answer to the original question I could think of. Testing using the `in` operator would probably be faster than regexp.
Tim McNamara
+1  A: 

My method will be to do something like,

find index of start string in s => i
find index of end string in s => j

substring = substring(i+len(start) to j-1)
Prabhu Jayaraman
A: 
s = "123123STRINGabcabc"

def find_between( s, first, last ):
    try:
        start = s.index( first ) + len( first )
        end = s.index( last, start )
        return s[start:end]
    except ValueError:
        return ""

def find_between_r( s, first, last ):
    try:
        start = s.rindex( first ) + len( first )
        end = s.rindex( last, start )
        return s[start:end]
    except ValueError:
        return ""


print find_between( s, "123", "abc" )
print find_between_r( s, "123", "abc" )

gives:

123STRING
STRINGabc

I thought it should be noted - depending on what behavior you need, you can mix index and rindex calls or go with one of the above versions (it's equivalent of regex (.*) and (.*?) groups).

cji
He said that he wanted a way that was more Pythonic, and this is decidedly less so. I'm not sure why this answer was picked, even OP's own solution is better.
Jesse Dhillon
Agreed. I'd use the solution by @Tim McNamara , or the suggestion by the same of something like `start+test+end in substring`
jeremiahd
Right, so it's less pythonic, ok. Is it less efficient than regexps too? And there's also @Prabhu answer you need to downvote, as it suggest the same solution.
cji
+1  A: 

Here is one way to do it

_,_,rest = s.partition(start)
result,_,_ = rest.partition(end)
print result

Another way using regexp

import re
print re.findall(re.escape(start)+"(.*)"+re.escape(end),s)[0]

or

print re.search(re.escape(start)+"(.*)"+re.escape(end),s).group(1)
gnibbler
+2  A: 
import re

s = 'asdf=5;iwantthis123jasd'
result = re.search('asdf=5;(.*)123jasd', s)
print result.group(1)
Nikolaus Gradwohl
OP added additional information that makes this one the best solution, IMO.
Jesse Dhillon
@Jesse Dhillon -- what about @Tim McNamara's suggestion of something like `''.join(start,test,end) in a_string`?
jeremiahd
A: 

This I posted before as code snippet in Daniweb:

# picking up piece of string between separators
# function using partition, like partition, but drops the separators
def between(left,right,s):
    before,_,a = s.partition(left)
    a,_,after = a.partition(right)
    return before,a,after

s = "bla bla blaa <a>data</a> lsdjfasdjöf (important notice) 'Daniweb forum' tcha tcha tchaa"
print between('<a>','</a>',s)
print between('(',')',s)
print between("'","'",s)

""" Output:
('bla bla blaa ', 'data', " lsdjfasdj\xc3\xb6f (important notice) 'Daniweb forum' tcha tcha tchaa")
('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f ', 'important notice', " 'Daniweb forum' tcha tcha tchaa")
('bla bla blaa <a>data</a> lsdjfasdj\xc3\xb6f (important notice) ', 'Daniweb forum', ' tcha tcha tchaa')
"""
Tony Veijalainen
A: 

String formatting adds some flexibility to what NG suggested. start and end can now be amended as desired.

import re

s = 'asdf=5;iwantthis123jasd'
start = 'asdf=5;'
end = '123jasd'

result = re.search('%s(.*)%s' % (start, end), s).group(1)
print(result)
Tim McNamara