ansaurus

Question

Sort a list of strings based on regular expression match or something similar

Answer 1

+3 A:

In [1]: def grp(pat, txt): 
   ...:     r = re.search(pat, txt)
   ...:     return r.group(0) if r else '&'

In [2]: y
Out[2]: 
['random text random text, can be anything blabla %A blabla',
 'random text random text, can be anything blabla %D blabla',
 'random text random text, can be anything blabla blabla %F',
 'random text random text, can be anything blabla blabla',
 'random text random text, %C can be anything blabla blabla']

In [3]: y.sort(key=lambda l: grp("%\w", l))

In [4]: y
Out[4]: 
['random text random text, can be anything blabla %A blabla',
 'random text random text, %C can be anything blabla blabla',
 'random text random text, can be anything blabla %D blabla',
 'random text random text, can be anything blabla blabla %F',
 'random text random text, can be anything blabla blabla']

llimllib 2009-07-04 15:38:59

That's the right strategy, but not quite the correct answer, as he needs nonmatched strings to go after matched ones. A quick fix is to return 'z' + txt for the nonmatch, and 'a' + r.groups()[0] for the match.

Sean Nyman 2009-07-04 15:45:20

yup, I left the specific sort he wanted as an exercise for the reader - I figured the important bit was the sort(key=...) and grp functions.

llimllib 2009-07-04 15:49:10

OK, you intrigued me, so I fixed it. Basically, if re.UNICODE is set, the 'z' + s solution may not work; the way I've done it should, I think.

llimllib 2009-07-04 16:03:23

at the cost of a good deal of complexity, I should add. I think I preferred the previous solution, he can probably use a [null, notnull] set just as easily as a [notnull, null] set.

llimllib 2009-07-04 16:04:38

Am I the only one a bit confused by the In[] and Out[] notation? What is that?

Tom 2009-07-04 17:09:29

r.groups()[0] -- peculiar -- why not just r.group(0)?

Alex Martelli 2009-07-04 17:13:58

@Tom I just copied from an ipython session (http://ipython.scipy.org/moin/) @Alex Because I've just been using groups()[i] for years now and didn't realize there was a group(i) function. (Also it would be group(1), not group(0), right?). Thanks for the note.

llimllib 2009-07-04 18:32:35

llimllib 2009-07-04 18:44:46

Answer 2

+1 A:

You could use a custom key function to compare the strings. Using the lambda syntax you can write that inline, like so:

strings.sort(key=lambda str: re.sub(".*%", "", str));

The re.sub(".*%", "", str) call effectively removes anything before the first percent sign so if the string has a percent sign it'll compare what comes after it, otherwise it'll compare the entire string.

Pedantically speaking, this doesn't just use the letter following the percent sign, it also uses everything after. If you want to use the letter and only the letter try this slightly longer line:

strings.sort(key=lambda str: re.sub(".*%(.).*", "\\1", str));

John Kugelman 2009-07-04 15:40:10

Answer 3

+2 A:

what about this? hope this helps.

def k(line):
    v = line.partition("%")[2]
    v = v[0] if v else 'z' # here z stands for the max value
    return v
print ''.join(sorted(open('data.txt', 'rb'), key = k))

sunqiang 2009-07-04 15:40:29

Answer 4

+1 A:

Here is a quick-and-dirty approach. Without knowing more about the requirements of your sort, I can't know if this satisfies your need.

Assume that your list is held in 'listoflines':

listoflines.sort( key=lambda x: x[x.find('%'):] )

Note that this will sort all lines without a '%' character by their final character.

Brandon E Taylor 2009-07-04 15:45:16

It'll compare %-less strings by their final characters.

John Kugelman 2009-07-04 15:51:57

Quite right. Thank you for the clarification, John.

Brandon E Taylor 2009-07-04 16:08:01

ansaurus

tags:

views:

answers:

Sort a list of strings based on regular expression match or something similar

related questions