views:

347

answers:

6

Hi, I am trying to refine a preg_match_all by finding the second occurrence of a period then a space:

<?php

$str = "East Winds 20 knots. Gusts to 25 knots. Waters a moderate chop.  Slight chance of showers.";

preg_match_all ('/(^)((.|\n)+?)(\.\s{2})/',$str, $matches);

$dataarray=$matches[2];
foreach ($dataarray as $value)
{ echo $value; }
?>

But it does not work: the {2} occurrence is incorrect.

I have to use preg_match_all because I a scraping dynamic HTML.

I want to capture this from the string:

East Winds 20 knots. Gusts to 25 knots.

Any ideas? Thx

+1  A: 

Why not just get all periods then a space and only use some of the results?

preg_match_all('!\. !', $str, $matches);
echo $matches[0][1]; // second match

I'm not sure what exactly you want to capture from this however. Your question is a little vague.

Now if you want to capture everything up to and including the second period (followed by a space) try:

preg_match_all('!^((?:.*?\. ){2})!s', $str, $matches);

It uses a non-greedy wildcard match and DOTALL so . matches newlines.

If you don't want to capture the last space, you can do that too:

preg_match_all('!^((?:.*?\.(?= )){2})!s', $str, $matches);

Also you may want to allow the string termination to count, which means either:

preg_match_all('!^((?:.*?\.(?: |\z)){2})!s', $str, $matches);

or

preg_match_all('!^((?:.*?\.(?= |\z)){2})!s', $str, $matches);

Lastly, since you're after one match and want the first one, you could just as easily use preg_match() rather than preg_match_all() for this.

cletus
Thanks, Alex. I am scraping a dynamic list, but this looks like it will work:(^)((.|\n)+?)((?:.*?\. ){2})Can you translate this last part for me?((?:.*?\. ){2})2nd occurrence of ... huh?Thx.
Steve
@Steve `(?:...)` is a *non-capturing group* meaning it won't create a separate entry in the `$matches` array. It otherwise acts as a capturing group in terms of precedence, etc. `.*?` is a *non-greedy* wildcard match. Normally wildcard matches in regexes grab as many characters as possible. Non-greedy matches grab as few as possible.
cletus
Well, my simplified example didn't get me the second occurrence of ". ", so I'll show you what I am doing: View the source of: weather.noaa.gov/cgi-bin/… I am running: preg_match_all ('/(#800000">)((.|\n)+?)((?:.*?\.\s){2})/',$content,$forecast); $dataarray=$forecast[2]; foreach ($dataarray as $value) { echo $value; } to try and get only: Tonight South Winds 10 To 14 Knots. Bay Waters A Light Chop. Friday Southwest Winds 10 To 15 Knots. Bay Waters A Light Chop. Still no joy. Any ideas??? Thanks so much for the help...
Steve
A: 

You can try:

<?php
$str = "East Winds 20 knots. Gusts to 25 knots. Waters a moderate chop.  Slight chance of showers.";
if(preg_match_all ('/(.*?\. .*?\. )/',$str, $matches))
    $dataarrray = $matches[1];
var_dump($dataarrray);
?>

Output:

array(1) {
  [0]=>
  string(40) "East Winds 20 knots. Gusts to 25 knots. "
}

Also if you want to capture just one occurrence, why are you using preg_match_all ? preg_match should suffice.

codaddict
Any reason why you have 3 r's in array? :P
alex
@alex: Its an optimization hint to the PHP interpreter ;)
codaddict
A: 

I don't think (.\s{2}) means what you think it means. As it stands, it will match ". " (a period followed by two spaces), not the second occurence of ". "

Rob Agar
Thanks, Rob. I knew the {2} was wrong. I thought there might be a quick fix, so I simplified my question. The real example is scraping the source of:http://weather.noaa.gov/cgi-bin/fmtbltn.pl?file=forecasts/marine/coastal/am/amz630.txtand doing a preg_match_all to end up with only:TODAYSOUTHEAST WINDS 7 TO 12 KNOTS BECOMING 10 TO 13 KNOTS. BAYWATERS A MODERATE CHOP.TONIGHTSOUTHEAST WINDS 8 TO 12 KNOTS BECOMING SOUTH 5 TO10 KNOTS. BAY WATERS A LIGHT CHOP.etc., for each day. Is there a way to do the 2nd occurrence of ". " in the way it is structured now?
Steve
+1  A: 

Here is a different approach

$str = "East Winds 20 knots. Gusts to 25 knots. Waters a moderate chop.  Slight chance of showers.";


$sentences = preg_split('/\.\s/', $str);

$firstTwoSentences = $sentences[0] . '. ' . $sentences[1] . '.';


echo $firstTwoSentences; // East Winds 20 knots. Gusts to 25 knots.
alex
A: 

no need regex. think simple

$str = "East Winds 20 knots. Gusts to 25 knots. Waters a moderate chop.  Slight chance of showers.";
$s = explode(". ",$str);
$s = implode(". ",array_slice($s,0,2)) ;
print_r($s);
ghostdog74
A: 

I want to capture this from the string: East Winds 20 knots. Gusts to 25 knots.

I have two suggestions:

1) Simply Explode the string at ". " (double space) and just print the result.

$arr = explode(".  ",$str);
echo $arr[0] . ".";
// Output: East Winds 20 knots. Gusts to 25 knots.

2) Use Explode and Strpos which is more performance-friendly than Preg_match_all.

foreach( explode(".",$str) as $key=>$val) {
    echo (strpos($val,"knots")>0) ? trim($val) . ". " : "";
}
// Output: East Winds 20 knots. Gusts to 25 knots.
Kristoffer Bohmann