views:

81

answers:

3

I'm looking for a regex to extract two numbers from the same text (they can be run independently, no need to extract them both in one go.

I'm using yahoo pipes.

Source Text: S$ 5,200 / month Negotiable, 1,475 sqft / 137 sqm (built-in) - Apartment, 10 Anson Road (D02)

Need to extract as a number: 1,475 and also (but can be extracted on a separate instance) Need to extract as a number: 137

I got the following pattern from someone quite helpful on a different forum:

\b(\d+(,\d+)*)\s+(sqft|sqm)

but when i go and use it with a replace $1, it brings back the whole source text instead of just the numbers i want (ie. 1,475 or 137 depending on whether i run \b(\d+(,\d+))\s+(sqft) or \b(\d+(,\d+))\s+(sqm)

what am i doing wrong?

A: 

Since you didn't specify a language, here is some Python:

import re

s = "$ 5,200 / month Negotiable, 1,475 sqft / 137 sqm (built-in) - Apartment, 10 Anson Road (D02)"
print re.search(r'\b([0-9.,]+) ?sqft ?/ ?([0-9.,]+) ?sqm', s).groups()
# prints ('1,475', '137')

Searches for any number, comma, or period after a word boundary, followed by an optional space, and the word 'sqft', then an optional space, a slash, an optional space space, followed by any number, comma, or period, an optional space, the word 'sqm'.

This should allow your formatting to be pretty loose (optional spaces, thousands and decimal separators).

Nick Presta
wow, that was very fast, i am using whatever language the regex module uses in yahoo pipe, how do i check that?usually what i test there it works on http://www.gskinner.com/RegExr/
macutan
Does Yahoo! Pipes allow usage in any actual programming language, or just in its GUI? I thought it was the latter, but it's been years since I've actually touched them.
Matchu
Pipes appears to support PCRE (http://rsscases.marketingstudies.net/content/yahoo_pipes_regex_module.php) so what I posted above should work: `\b([0-9.,]+) ?sqft ?/ ?([0-9.,]+) ?sqm`
Nick Presta
what i am trying to achieve is on the below pipe linkhttp://pipes.yahoo.com/pipes/pipe.edit?_id=c6af42d4ebb8a2afc2f139338bf9f627on the middle column within the regex module item.size_sqft, so in there i would like to put the pattern ideally get (within the debug panel below that item.size_sqft: 1,475 i try your patterns in http://www.gskinner.com/RegExr/ but when i go to the replace it doesn't seem to just give me the number and instead the line again with the number... where am i wrong?
macutan
+2  A: 

Well you could do this by iterating through the matches and getting the results that way.

But if you want to use the replace method then this could work:

^.*?(?<sqft>\d+(,\d+)*)\s?sqft.*?(?<sqm>\d+(,\d+)*)\s?sqm.*$

And then replace with:

${sqft}
${sqm}

Here it is in action.

This will work with or without a comma in the sqft or sqm numbers. And the .* at the beginning, middle, and end forces it to match the entire string so that the replacement text eliminates everything except for what you're after.

Steve Wortham
this worked like a charm!!, how did you get it so fast?!??thanks a lot!
macutan
@macutan: Be sure to click on the check mark next to an answer if it answered your question, so that the poster gets credit :)
Matchu
@macutan: Great. I actually just changed the number matching scheme a bit. The repeatable commas are a nice feature in your original post so it can match numbers like 1,234,567. So my revised regex above has incorporated that feature. And I've been practicing regular expressions a lot the past several months. I guess I've gotten faster. ;)
Steve Wortham
thanks steve. when try your pattern within the regex module in my pipehttp://pipes.yahoo.com/pipes/pipe.edit?_id=c6af42d4ebb8a2afc2f139338bf9f627(see middle column) i get a lot of the belowproblem in matching expressionproblem in matching expressionproblem in matching expressionbut when i put it on the site i use to test it http://www.gskinner.com/RegExr/ it seems to work,. what should i put within my yahoo pipe replace box for it to work?thanks to all once again for all of the answers...
macutan
thanks Steve, this worked, i just tried it again and refreshed the pipe and it worked!!!! Thank you All for your help. Ahmad, Nick, Don and Steve!!.
macutan
A: 

In perl, I would write something like:

if ($line ~= m/\b([0-9.,]+) sqft/)
{
  $sqft = $1;
}
else
{
  $sqft = undef;
}

if ($line ~= m/\b([0-9.,]+) sqm/)
{
  $sqm = $1;
}
else
{
  $sqm = undef;
}
Don