views:

51

answers:

2

I'm trying to extract the postal codes from yell.com using php and preg_replace. I successfully extracted the postal code but only along with the address. Here is an example

$URL = "http://www.yell.com/ucs/UcsSearchAction.do?scrambleSeed=17824062&keywords=shop&layout=&companyName=&location=London&searchType=advance&broaderLocation=&clarifyIndex=0&clarifyOptions=CLOTHES+SHOPS|CLOTHES+SHOPS+-+LADIES|&ooa=&M=&ssm=1&lCOption32=RES|CLOTHES+SHOPS+-+LADIES&bandedclarifyResults=1";

//get yell.com page in a string
 $htmlContent  = $baseClass->getContent($URL); 
//get postal code along with the address 
 $result2 =   preg_match_all("/(.*)/", $htmlContent, $matches);

print_r($matches);

The above code ouputs something like Array ( [0] => Array ( [0] => 7, Royal Parade, Chislehurst, Kent BR7 6NR [1] => 55, Monmouth St, London, WC2H 9DG .... the problem that I have is that I don't know how to extract only the postal code without the address because it doesn't have an exact number of digits (sometimes it has 6 digits and sometimes has only 5 times). Basically I should extract the lasted 2 words from each array . Thank you in advance for any help !

A: 

quick & dirty:

# your array item
$string = "7, Royal Parade, Chislehurst, Kent BR7 6NR";

# split on spaces
$bits = preg_split('/\s/', $string);

# last two bits
end($bits);
$postcode = prev($bits) . " " . end($bits);

echo $postcode;

See it run at: code pad

Erik
I tried to use your code and i faced 2 issues :1. The postcode even in your example is not displayed properly. Basically the output of your code is "6NR BR7" instead of the correct form which should be "BR7 6NR"2. The code doesn't work at all if I use the output from my code as $string variable . Basically I have $result2 = preg_match_all("/<span class=\"address\">(.*)<\/span>/", $htmlContent, $matches); $address = $matches[0][0];# split on spaces$bits = preg_split('/\s/', $address);# last two bits$postcode = end($bits) . prev($bits);echo $postcode;
Michael
yes, i fixed it, have another look it prints in the proper order
Erik
I'm not sure I understand the second part of your problem. You said you're getting the string `7, Royal Parade, Chislehurst, Kent BR7 6NR` -- if you have that string, and perform the above operations on them, it will work. If that's not your string, then I suggest you update your question.
Erik
Hi Erik ,thank you very much for your support . The second part of the problem was that the address included some extra spaces and also some hidden html tags that I didn't see so that's why I didn't get the correct result . However after I applied some strip_tags and a function to remove the extra white spaces it worked like a charm !
Michael
A: 

If you just need to match the last two words in a string, you can use this regex:

\b\w+\s+\w+$

This will match what it says: a word boundary, some non-empty word, some white spaces, then another word, followed by end of string anchor.

<?php

$text = "7, Royal Parade, Chislehurst, Kent BR7 6NR";
$result =   preg_match("/\\b\\w+\\s+\\w+$/", $text, $matches);
print_r($matches);

?>

This prints:

Array
(
    [0] => BR7 6NR
)

You may also make the regex more robust by allowing optional trailing white spaces after the last word \s*, etc, but using the $ is the main idea.

polygenelubricants