views:

425

answers:

2

I have written a code to find the substring from a string. It prints all substrings. But I want a substring that ranges from length 2 to 6 and print the substring of minimum length. Please help me

Program:

import re
p=re.compile('S(.+?)N')
s='ASDFANSAAAAAFGNDASMPRKYN'
s1=p.findall(s)
print s1

output:

['DFA', 'AAAAAFG', 'MPRKY']

Desired output:

'DFA'  length=3
+3  A: 

The regex 'S(.{2,6}?)N' will give you only matches with length 2 - 6 characters.

To return the shortest matching substring, use sorted(s1, key=len)[0].

Full example:

import re
p=re.compile('S(.{2,6}?)N')
s='ASDFANSAAAAAFGNDASMPRKYNSAAN'
s1=p.findall(s)
if s1:
    print sorted(s1, key=len)[0]
    print min(s1, key=len) # as suggested by Nick Presta

This works by sorting the list returned by findall by length, then returning the first item in the sorted list.

Edit: Nick Presta's answer is more elegant, I was not aware that min also could take a key argument...

codeape
+8  A: 

If you already have the list, you can use the min function with the len function as the second argument.

>>> s1 = ['DFA', 'AAAAAFG', 'MPRKY']
>>> min(s1, key=len)
'DFA'

EDIT:
In the event that two are the same length, you can extend this further to produce a list containing the elements that are all the same length:

>>> s2 = ['foo', 'bar', 'baz', 'spam', 'eggs', 'knight']
>>> s2_min_len = len(min(s2, key=len))
>>> [e for e in s2 if len(e) is s2_min_len]
['foo', 'bar', 'baz']

The above should work when there is only 1 'shortest' element too.

EDIT 2: Just to be complete, it should be faster, at least according to my simple tests, to compute the length of the shortest element and use that in the list comprehension. Updated above.

Nick Presta
+1 Far more elegant than my sorted()[0] solution...
codeape