tags:

views:

229

answers:

6

So here is the string that im scraping a page to read (using file get contents)

<th>Kills (K)</th><td><strong>4,751</strong></td><td><strong>0</strong></td>

How can i navigate to the above section of the page contents, and then isolate the 4,751 inside the above html and load it into $kills ?

Difficulty: the number will change and have additional numbers before the comma

A: 

here you go

preg_match_all('|<th>.*?</th><td><strong>([\d,]+)</strong></td>|x', $subject,$match);
var_dump($match);

but if I were you I would use xpath it's is safer.

RageZ
inside the th tags, can i change that to select different rows? Or will that break the whole shabang? Like there is a row for Kills, and one for Deaths
Patrick
yeah the .*? would match any text ... just it would match <th>bla</th><td><strong>a number</strong></td> ... put like I said you probably have better to go on the xpath direction less code and less possible error
RageZ
will this work if there are tabs or spaces inbetween the html tags?
Patrick
nop you have to use x modifier if you wanna ignore tabs i.e.preg_match_all('|<th>.*?</th><td><strong>(\d+,\d+)</strong></td>|x', $subject, $match);
RageZ
A: 

This should do it:

if (preg_match("/<th>Kills \(K\)<\/th><td><strong>([\d,]+)<\/strong>/", 
               $string, $matches)) {
  $kills = str_replace(",","",$matches[1]);
} else {
  $kills = 0;
}
gnarf
should use match all I suppose he has plenty of row to read ...
RageZ
A: 

This is what im using and gnarf's code returns 0

RageZ's returned an empty array

<?
$string = file_get_contents("http://combatarms.nexon.net/Community/Profile.aspx?user=tect0n");


if (preg_match("/<th>Kills \(K\)<\/th><td><strong>([\d,]+)<\/strong>/", 
               $string, $matches)) {
  $kills = str_replace(",","",$matches[1]);
} else {
  $kills = 0;
}
echo $kills;

?>

Coming up 0

Patrick
let me check ... my reg exp
RageZ
just checked it work ...
RageZ
ie that work <?php$subject = '<th>Kills (K)</th><td><strong>4,751</strong></td><td><strong>0</strong></td>';preg_match_all('|<th>.*?</th><td><strong>(\d+,\d+)</strong></td>|', $subject,$match);var_dump($match);result :array(2) { [0]=> array(1) { [0]=> string(49) "Kills (K)4,751" } [1]=> array(1) { [0]=> string(5) "4,751" } }
RageZ
Returns array(2) { [0]=> array(0) { } [1]=> array(0) { } } for me
Patrick
+1  A: 

Ok got it to work by removing all spaces and turning the page contents into a string

<?
$url = "http://combatarms.nexon.net/Community/Profile.aspx?user=tect0n";
$raw = file_get_contents($url);
$newlines = array("\t","\n","\r","\x20\x20","\0","\x0B");
$content = str_replace($newlines, "", html_entity_decode($raw));
preg_match_all('|<th>.*?</th><td><strong>(\d+,\d+)</strong></td>|', $content,$match); 
?>

This returns

Array ( [0] => Array ( [0] => Kills (K)4,751  [1] => Deaths (D)4,868  ) [1] => Array ( [0] => 4,751 [1] => 4,868 ) )
Patrick
A: 
preg_match_all('#\(K\).*?<strong>(.*?)</strong>#s',$html,$matches);

tell me that aint pretty

John
A: 
preg_match('#<table class="tbl_profile">(.*?)</table>#s',file_get_contents('http://combatarms.nexon.net/Community/Profile.aspx?user=tect0n'),$m);
preg_match_all('#<tr>.*?<t.*?>(.*?)</t.*?>.*?<t.*?>(.*?)</t.*?>.*?<t.*?>(.*?)</t.*?>.*?</tr>#s',preg_replace('#(<strong>)|(</strong>)|(<!--.*?-->)#s','',$m[1]),$r);
echo 'You got '.$r[2][1].' killz';
//print_r($r);

now tell me thaaaaaaat aint pretty, cooooool it.

John