tags:

views:

1120

answers:

4

How do I retrieve all src value using regex in php?

<script type="text/javascript" src="http://localhost/assets/javascript/system.js" charset="UTF-8"></script>
<script type='text/javascript' src='http://localhost/index.php?uid=93db46d877df1af2a360fa2b04aabb3c' charset='UTF-8'></script>

The retrieved value should only contains:

Thank you.

+5  A: 
/src=(["'])(.*?)\1/

example:

<?php

$input_string = '<script type="text/javascript" src="http://localhost/assets/javascript/system.js" charset="UTF-8"></script>';
$count = preg_match('/src=(["\'])(.*?)\1/', $input_string, $match);
if ($count === FALSE) 
    echo('not found\n');
else 
    echo($match[2] . "\n");

$input_string = "<script type='text/javascript' src='http://localhost/index.php?uid=93db46d877df1af2a360fa2b04aabb3c' charset='UTF-8'></script>";
$count = preg_match('/src=(["\'])(.*?)\1/', $input_string, $match);
if ($count === FALSE) 
    echo('not found\n');
else 
    echo($match[2] . "\n");

gives:

http://localhost/assets/javascript/system.js
http://localhost/index.php?uid=93db46d877df1af2a360fa2b04aabb3c
Scott Evernden
Hi Scott, thanks. But I was using preg_match_all and yours doesn't seem to output anything. I am using preg_match_all("/http:\/\/(.*?)[^\"']+/", $scripts, $matches, PREG_SET_ORDER); now
`preg_match` does always return an integer value.
Gumbo
+4  A: 

Maybe it is just me, but I don't like using regular expressions for finding things in pieces of HTML, especially when the HTML is unpredictable (perhaps comes from a user or other web pages).

How about something like this:

$doc =
<<<DOC
    <script type="text/javascript" src="http://localhost/assets/javascript/system.js" charset="UTF-8"></script>
    <script type='text/javascript' src='http://localhost/index.php?uid=93db46d877df1af2a360fa2b04aabb3c' charset='UTF-8'></script>
DOC;

$dom = new DomDocument;
$dom->loadHTML( $doc );

$elems = $dom->getElementsByTagName('*');

foreach ( $elems as $elm ) {
    if ( $elm->hasAttribute('src') )
        $srcs[] = $elm->getAttribute('src');
}

print_r( $srcs );

I don't know what the speed difference is between this and a regular expression but it takes me a heck of a lot less time to read it and understand what I'm trying to do.

Nick Presta
Thanks Nick for the alternative but I'll stick with regex because it'll at most have a few line of javascripts files. Perhaps it's a matter of preference. :)
+3  A: 

I agree with Nick, use the DomDocument object to fetch your data. Here is a xpath version:

$doc =
<<<DOC
    <script type="text/javascript" src="http://localhost/assets/javascript/system.js" charset="UTF-8"></script>
    <script type='text/javascript' src='http://localhost/index.php?uid=93db46d877df1af2a360fa2b04aabb3c' charset='UTF-8'></script>
DOC;

$doc = new DomDocument;
$doc->loadHTML($doc);

$xpath = new DomXpath($doc);
$elements = $xpath->query('//[@src]');

foreach($elements as $element)
{
    echo $element->nodeValue;
}
alexn
Thanks Alex, please refer to the above comment.
+1  A: 

If you decide to go the regex route, this should be useful for you

/(?<=\<).*?src=(['"])(.*?)\1.*?(?=/?\>)/si
KOGI