tags:

views:

106

answers:

2

Sample Text:

SUBJECT = 'NETHERLANDS MUSIC EPA'
CONTENT = 'Michael Buble performs in Amsterdam Canadian singer Michael Buble performs during a concert in Amsterdam, The Netherlands, 30 October 2009. Buble released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK '

Expected result:

"
NETHERLANDS MUSIC EPA | 36 before
Michael Buble performs in Amsterdam Canadian singer Michael Buble performs during a concert in Amsterdam, The Netherlands, 30 October 2009. Buble released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK
"

How can I accomplish this in Python?

+1  A: 

Looks like you want something like...:

import re

x = re.compile(r'^([^\|]*?)\s*\|[^\n]*\n\s*(.*?)\s*$')

s = """NETHERLANDS MUSIC EPA | 36 before
Michael Buble performs in Amsterdam Canadian singer Michael Buble performs during a concert in Amsterdam, The Netherlands, 30 October 2009. Buble released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK"""

mo = x.match(s)

subject, content = mo.groups()

print 'SUBJECT =', repr(subject)
print 'CONTENT =', repr(content)

which does emit, as you require,

SUBJECT = 'NETHERLANDS MUSIC EPA'
CONTENT = "Michael Buble performs in Amsterdam Canadian singer Michael Buble performs during a concert in Amsterdam, The Netherlands, 30 October 2009. Buble released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK"

Or maybe you want to do the reverse (as a comment suggested)? then they key RE could be

y = re.compile(r'^.*SUBJECT\s*=\s*\'([^\']*)\'.*CONTENT\s*=\s*"([^"]*)"',
               re.DOTANY)

and you can use this similarly to get a match-object, extract subject and content as its groups, and format them for display as you wish.

In either case it's possible that you may need tweaks -- since you haven't given precise specs, just one single example!, it's hard to generalize reliably.

Alex Martelli
haven't you reversed the problem ? it seems the OP wants to parse the output of your script to generate the input of your script...
Adrien Plisson
@Adrien, maybe - just in case, I showed a solution for the reverse case as well, so, tx for pointing it out.
Alex Martelli
thanks a lot! :)
paul
@paul, you're welcome!
Alex Martelli
A: 

Here's a simple solution. I am using Python 3 but I think this code would be identical in 2:

>>> import re
>>> pair = re.compile("SUBJECT = '([^\n]*)'\nCONTENT = '([^\n]*)'\n", re.MULTILINE)
>>> s = """SUBJECT = 'NETHERLANDS MUSIC EPA'
... CONTENT = 'Michael Buble performs in Amsterdam Canadian singer Michael Buble performs during a concert in Amsterdam, The Netherlands, 30 October 2009. Buble released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK '
... """
>>> m = pair.match(s)
>>> m.group(1) + "\n" + m.group(2)
"NETHERLANDS MUSIC EPA\nMichael Buble performs in Amsterdam Canadian singer Michael Buble performs during a concert in Amsterdam, The Netherlands, 30 October 2009. Buble released his new album entitled 'Crazy Love'. EPA/OLAF KRAAK "
andrew cooke