tags:

views:

111

answers:

6

The problem:
I need to extract strings that are between $ characters from a block of text, but i'm a total n00b when it comes to regular expressions.

For instance from this text:
Li Europan lingues $es membres$ del sam familie. Lor $separat existentie es un$ myth.

i would like to get an array consisting of:
{'es membres', 'separat existentie es un'}

A little snippet in Python would be great.

+1  A: 

The regex below captures everything between the $ characters non-greedily

\$(.*?)\$

ennuikiller
+3  A: 

You can use re.findall:

>>> re.findall(r'\$(.*?)\$', s)
['es membres', 'separat existentie es un']
Mark Byers
why the downvote?
Michael Krelin - hacker
@Michael, some might think an answer like this deserves a link to the docs (I do), but it's succinct and correct so it certainly doesn't deserve a downvote for the lack. I'll counteract it with an upvote.
Peter Hansen
A: 
import re;
m = re.findall('\$([^$]*)\$','Li Europan lingues $es membres$ del sam familie. Lor $separat existentie es un$ myth');
Michael Krelin - hacker
You don’t need to escape the `$` inside a character class.
Gumbo
Although the OP didn't say his input could include empty pairs of dollar signs (no characters between), the use of "+" instead of "*" means this would get out of sync if that did occur. More importantly, without a group (using parantheses), the output includes the dollar signs.
Peter Hansen
True. Both of you are right. edited.
Michael Krelin - hacker
A: 

Valid regex demo in Perl:

my $a = 'Li Europan lingues $es membres$ del sam familie. Lor $separat existentie es un$ myth.';
my @res;
while ($a =~ /\$([^\$]+)\$/gos)
{
 push(@res, $1);
}

foreach my $item (@res)
{
 print "item: $item\n";
}

flags: s - treat all input text as single line, g - global

UncleMiF
The question was tagged "Python" and included an explicit request for a Python snippet in the answer.
Peter Hansen
Well, it was a "would-be-great" type of request. I don't think the lack of python snippet justifies downvote. Naturally, I wouldn't upvote it either.
Michael Krelin - hacker
A: 

Alternative without regexes which works for this simple case:

>>> s="Li Europan lingues $es membres$ del sam familie. Lor $separat existentie es un$"
>>> s.split("$")[1::2]
['es membres', 'separat existentie es un']

Just split the string on '$' (this gives you a python list) and then only use every 'second' element of this list.

ChristopheD
-1 It DOESN'T work. Did you compare your answer with what the OP expected? Hint: try it again with [1::2] instead of [::2]
John Machin
True (must have typed/answered too fast). Edited accordingly.
ChristopheD
+2  A: 

Import the re module, and use findall():

>>> import re
>>> p = re.compile('\$(.*?)\$')
>>> s = "apple $banana$ coconut $delicious ethereal$ funkytown"
>>> p.findall(s)
['banana', 'delicious ethereal']

The pattern p represents a dollar sign (\$), then a non-greedy match group ((...?)) which matches characters (.) of which there must be zero or more (*), followed by another dollar sign (\$).

John Feminella