tags:

views:

115

answers:

2

Given that string:

\n
\n
text1\n
\ttext2\n
  Message: 1st message\n
some more text\n
\n
\n
  Message: 2dn message\n\n
\t\t
Message: 3rd message\n
text3\n

I want to extract messages from a multiline string (token is 'Message: '). What regex expression should I use to capture those 3 groups:

  • group 1 : '1st message'
  • group 2 : '2dn message'
  • group 3 : '3rd message'

I tried a lot of things but I can get the expression to work because the string is a multiline string.

My program is in python 2.6 but I suppose it does not make a big difference what language I use...

+9  A: 
>>> re.findall('Message: (.+?)$', s, re.M)
['1st message', '2dn message', '3rd message']

re.M flag gives special meaning to ^ and $:

When specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character '$' matches at the end of the string and at the end of each line (immediately preceding each newline). By default, '^' matches only at the beginning of the string, and '$' only at the end of the string and immediately before the newline (if any) at the end of the string.

(.+?)$ matches at least one character till the closest end of the string-character.

edit: indeed the simple version will work too:

>>> re.findall('Message: (.+)', s)
['1st message', '2dn message', '3rd message']

I'm surprised it wasn't in the list of those numerous things you tried :)

SilentGhost
Great! can you please explain how it works?
Sly
If that's what he wants, why not just re.findall('Message: (.+)', s)?
Matthew Flaschen
I'm totally new to regex and I was on the wrong track. First I was trying to use `match`, not `findall` (I'm not to sure what each one does but I'll read on that). And for some reason, I thought I had to use wildcards at the beginning of the expression.
Sly
@Sly: `findall` is similar to `search` rather than `match`, except it doesn't stop after finding a match but continues and try to match more stuff and accumulates everything in the list. `match` starts searching from the beginning of the string only. And no, you don't have to use use wildcards at the beginning of the string.
SilentGhost
A: 

@OP,you don't need a regex. Assuming you don't care about the lines after "Message:",

for line in mystring.split("\n")
    if "Message:" in line:
         print "found: ",line
ghostdog74