ansaurus

Question

Answer 1

+1 A:

This should do the trick if you only want the stuff after the numbers:

re.findall(r'\d+: (.+)(?:, .+)*\.', input, re.M | re.S)

And if you want everything:

re.findall(r'(\d+): (.+)(?:, .+)*\.', input, re.M | re.S)

And if you want to get them separated out into a list of matches, a nested regex will do it:

re.findall(r'[^,]+,|[^,]+$', re.findall(r'\d+: (.+)(?:, .+)*\.', input, re.M | re.S)[0],re.M|re.S)

JGB146 2010-07-23 22:08:43

You should test this: it doesn't work.

Ned Batchelder 2010-07-23 22:10:57

Odd. The same regex is working for me. That said, after looking back at his input the final `.` should probably be a literal `\.`

JGB146 2010-07-23 22:21:41

Ah, with another look I see what you mean (I think). I've edited so that the extraneous other junk isn't included (unless he wants it).

JGB146 2010-07-23 22:37:34

It works but i still need to split the names with a ".split(',')".

Gianluca Bargelli 2010-07-23 22:43:27

Another option added: this one returns the individual matches.

JGB146 2010-07-23 23:07:45

My result is ['Ashkenazi A,', ' Benlifer A,', ' Korenblit J,', ''], one name is missing from the list.

Gianluca Bargelli 2010-07-23 23:15:07

Ok, I'm off at dinner. Will perfect it to match exactly when I get home. I think I can get rid of the blank match too.

JGB146 2010-07-23 23:51:20

90% sure that it will work if you change $ to [^,]+$

JGB146 2010-07-24 00:05:44

There we go. As I thought, the latest edit is returning everything exactly as desired.

JGB146 2010-07-24 01:15:55

It works correctly, thanks!

Gianluca Bargelli 2010-07-24 07:34:44

Answer 2

+6 A:

A regular expression probably isn't the best way to solve this. You could use split():

>>> s = '152: Ashkenazi A, Benlifer A, Korenblit J, Silberstein SD.'
>>> s.split(": ")
['152', 'Ashkenazi A, Benlifer A, Korenblit J, Silberstein SD.']
>>> s.split(": ")[1].split(", ")
['Ashkenazi A', 'Benlifer A', 'Korenblit J', 'Silberstein SD.']

Greg Hewgill 2010-07-23 22:08:47

I am considering to mark this one as the solution to my problem, but i'll wait to see if someone else can provide a pure RegEx solution to my question. Just curious :)

Gianluca Bargelli 2010-07-23 22:52:05

Answer 3

A:

If you means that there may be more (or less too) names, you should maybe try something like this: (\d+): (.+)*? Asterisk (*) means 0 or more occurrence of (.+)

Ventus 2010-07-23 22:09:25

Answer 4

A:

I can get close, but further processing may be necessary. It is probably better to do manual string splitting, especially if the data is reliably well-formatted.

Code

import re
string1 = '152: Ashkenazi A, Benlifer A, Korenblit J, Silberstein SD.'
string2 = '152: Ashkenazi A, Benlifer A, Korenblit J, Silberstein SD, Hattingh CJR.'
for i in [string1, string2]:
    print re.findall(r'(\d+):|(?:[.,\s?])?(.*?)(?:[.,])', i)

Output

[('152', ''), ('', 'Ashkenazi A'), ('', 'Benlifer A'), ('', 'Korenblit J'), ('', 'Silberstein SD')]
[('152', ''), ('', 'Ashkenazi A'), ('', 'Benlifer A'), ('', 'Korenblit J'), ('', 'Silberstein SD'), ('', 'Hattingh CJR')]

Edit: using 2 expressions

If you are willing to use two regex expressions, it can be done fairly painlessly:

import re
string1 = '152: Ashkenazi A, Benlifer A, Korenblit J, Silberstein SD.'
string2 = '152: Ashkenazi A, Benlifer A, Korenblit J, Silberstein SD, Hattingh CJR.'
for i in [string1, string2]:
    print re.findall(r'^(\d+):', i)
    print re.findall(r'(?:[:,] )(\S+ [A-Z]+)(?=[\.,])', i)

produces

['152']
['Ashkenazi A', 'Benlifer A', 'Korenblit J', 'Silberstein SD']
['152']
['Ashkenazi A', 'Benlifer A', 'Korenblit J', 'Silberstein SD', 'Hattingh CJR']

cjrh 2010-07-23 22:10:34

Well you got near indeed :) i can barely read that regular expression!

Gianluca Bargelli 2010-07-23 22:45:37

Nice solution! :) It is similiar to @JGB146 's as it requires more than one regex. Thanks!

Gianluca Bargelli 2010-07-24 14:30:22

ansaurus

tags:

views:

answers:

Help on Regular Expression problem

Code

Output

Edit: using 2 expressions

related questions