views:

64

answers:

1

Im trying to extract an address (written in french) out of a listing using regex. here is the example:

"Don't wait, this home won't be on the market for long! Pictures can be forwarded upon request.

123 de la street - city 345-555-1234 "

Imagine that whole thing is item.description. Here is a working set so far:

In "item.description", replace "^\d{1,4} des|de la|du [^,\s]+$" with "whatever"

and the address (123 de la street) will be correctly written over with whatever. BUT if I try to make it the only thing kept from the description, something like this (which dosent work):

In "item.description" replace "(.)(^\d{1,4} des|de la|du [^,\s]+$)(.)" with "$2"

What would be the best way to replace the whole description with just the address?

Thanks!

+1  A: 

Try adding * to the first and last token, plus watch out for ^$ signs! (They match start and end of the text.)

"^(.*)(\d{1,4} des|de la|du [^,\s]+)(.*)$"
Miroslav Bajtoš
Thanks Miroslav, I tried this as well with no luck. I would have assumed this to work though... have a look at the comment I left on David's answer to see if that changes anything
JB Lesage
Since your text is spanning multiple lines, I would assume the problem is that "." doesn't match newline characters. I am not familiar with Yahoo Pipes, so I can't advice you on how to change this behaviour.
Miroslav Bajtoš
The multiline was the problem, I just removed all <br> tags before running this regex and it worked. Thank you!
JB Lesage