views:

143

answers:

3

I have a regular expression that I have written in order to extract values that a user enters and replace some height and width values and keep the urls. This is so it can be safely added to a database.

This is what I have so far (just trying to get the preg_match to return a TRUE value)

$test ='<object height="81" width="100%"> <param name="movie" value="http://player.soundcloud.com/player.swf?url=http%3A%2F%2Fsoundcloud.com%2Ftheshiverman%2Fsummer-beats-july-2010&amp;secret_url=false"&gt;&lt;/param&gt; <param name="allowscriptaccess" value="always"></param> <embed allowscriptaccess="always" height="81" src="http://player.soundcloud.com/player.swf?url=http%3A%2F%2Fsoundcloud.com%2Ftheshiverman%2Fsummer-beats-july-2010&amp;secret_url=false" type="application/x-shockwave-flash" width="100%"></embed> </object>'; 
  if (preg_match('/<object height=\"[0-9]*\" width=\"[0-9]*\"><param name=\"movie\" value=\"(.*)\"><\/param><param name=\"allowscriptaccess\" value=\"always\"><\/param><embed allowscriptaccess=\"always\" height=\"[0-9]*\" src=\".*\" type=\"application\/x-shockwave-flash\" width=\"100%\"><\/embed><\/object>/', $test)) {

$embed = $test;

} else {

$embed = 'FALSE';

}

I seem to have done something wrong in the validation, as it always returns false.

+1  A: 

The first thing I see that will fail is:

width="100%"  will not match /width=\"[0-9]*\"/

I don't know the exact PHP definition of regular expression; But I am not sure this will match (A space in the reg-expression may match zero or more spaces in the target text but the other way around will not work):

> <param      will not match (probably) /><param/

As you can see parsing XML with regular expressions is hard and error prone.
What you really want to do is use an XML SAX parser.

Try this: PS my PHP is not great so it could contain mistakes.

PS. The long URLs were not encoded correctly for XML. I used urlencode() here just to stop the error messages. I did not check to see if that made sense.

<?php

$test = '<object height="81" width="100%">'
            .'<param name="movie" value="'
                .urlencode('http://player.soundcloud.com/player.swf?url=http%3A%2F%2Fsoundcloud.com%2Ftheshiverman%2Fsummer-beats-july-2010&amp;secret_url=false')
            .'">'
            .'</param>'
            .'<param name="allowscriptaccess" value="always">'
            .'</param>'
            .'<embed allowscriptaccess="always" height="81" src="'
                .urlencode('http://player.soundcloud.com/player.swf?url=http%3A%2F%2Fsoundcloud.com%2Ftheshiverman%2Fsummer-beats-july-2010&amp;secret_url=false')
                .'" type="application/x-shockwave-flash" width="100%">'
            .'</embed>'
        .'</object>';

function JustPrint($parser,$data)
{
    print $data;
}

function OpenTag($parser,$name ,$attribs)
{
    // For special tags add a new attribute.
    if (strcasecmp($name, "object") == 0)
    {
        $attribs['Martin'] = 'York';
    }


    // Print the tag.
    print "<$name ";
    foreach ($attribs as $loop => $value)
    {
        print "$loop=\"$value\" ";
    }
    print ">\n";
}

function CloseTag($parser,$name)
{
    print "<$name/>\n";
}

$xmlParser  =  xml_parser_create();
xml_set_default_handler($xmlParser ,'JustPrint'  );
xml_set_element_handler($xmlParser, 'OpenTag'  , 'CloseTag'  );
xml_parse($xmlParser, $test);

?>
Martin York
Michael Mallett
Have a look at: http://php.net/manual/en/book.xml.php
Martin York
I have no idea what I'm looking for in that link.
Michael Mallett
Its an XML parser. What you want to do is read the XML into a data structure. Modify and write it back out. Using regular expressions to try and manipulate XML is a loosing game.
Martin York
@Michael Mallett: Please see if the XML works.
Martin York
Sorry, I don't know where to begin with that. I really don't know what you're trying to do or what I would do with that code. I don't want to alter it in anyway, otherwise when it's spat back out of that database it won't mean anything. I just want to make sure it is of the correct format and safe to put into an sql query.Are you telling me that there is not a regular expression that would allow a url with percentage symbols in it? I just want to do that and stick some parenthesis around it so that I can get it into an array.
Michael Mallett
@Michael Mallet: What I am saying is that using reg expressions is a bad idea for parsing XML. You do need to learn how to use the XML parser.
Martin York
See: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags
Martin York
A: 

I don't want to manipulate it if possible (just replace the height values. I want it to stay exactly like it is, I'm using regular expressions to mimimise sql injections and make sure it is an embed code.

Can it not just be treated like a string and kept exactly as it is, but checked against something?

For instance, this works with a youtube embed link:

/preg_match(<object width=\"([0-9]*)\" height=\"([0-9]*)\"><param name=\"movie\" value=\"(.*)\"><\/param><param name=\"allowFullScreen\" value=\".*\"><\/param><param name=\"allowscriptaccess\" value=\".*\"><\/param><embed src=\".*\" type=\".*\" allowscriptaccess=\".*\" allowfullscreen=\".*\" width=\"[0-9]*\" height=\"[0-9]*\"><\/embed><\/object>/',$test,$preg_out)

preg_match[0] preg_match[1] preg_match[3]

return the width height and the url of the object.

Michael Mallett
A: 

If what you want to do is allow the user to give you SoundCloud embeds while you retain the rights to style the players, you might want to look into oEmbed which is well supported by SoundCloud (see here) and other parties. This way, users just enter their normal track URLs and you can resolve those in the backend as you see fit.

Also, keep in mind that an embed code with a different order of <param>'s is still a valid embed code but would be very difficult to match with a regex

Robert