ansaurus

Question

Answer 1

+1 A:

Try to use SimpleXML and foreach by the elements - then check if class attribute is valid and grab the data-url's

hsz 2010-09-01 13:36:36

Answer 2

A:

preg_match_all("/data/" data-url=/"([^']*)/i", $string , $urls);

You can fetch all URls a=by this way.

And you can also use simplexml as hsz mentioned

Maulik Vora 2010-09-01 13:41:09

Answer 3

+4 A:

If your string contains more than just the HTML snippet you show, you should use DOM with this XPath

//span/@data-url

Example:

$dom = new DOMDocument;
$dom->loadHTML($string);
$xp = new DOMXPath($dom);
foreach( $xp->query('//span/@data-url') as $node ) {
    echo $node->nodeValue, PHP_EOL;
}

The above would output

http://www.google.com
http://www.yahoo.com

When you already have the HTML loaded, you can also do

echo $dom->documentElement->textContent;

which returns the same result as strip_tags($string) in this case:

text text
google.com
text yahoo.com text.

Gordon 2010-09-01 13:53:11

There is no text node within an attribute value. This should be `//span/@data-url`.

Tomalak 2010-09-01 13:55:36

@Tomalak Fixed. Thanks

Gordon 2010-09-01 13:57:11

Answer 4

A:

The short answer is: don't. There's a lovely rant somewhere around SO explaining why parsing html with regexes is a bad idea. Essentially it boils down to 'html is not a regular language so regular expressions are not adequate to parse it'. What you need is something DOM aware.

As @hsz said, SimpleXML is a good option if you know that your html validates as XML. Better might be DOMDocument::loadHTML which doesn't require well-formed html. Once your html is in a DOMDocument object then you can extract what you will very easily. Check out the docs here.

dnagirl 2010-09-01 14:01:20

ansaurus

tags:

views:

answers:

How to strip tags in PHP using regex?

related questions