tags:

views:

2124

answers:

5

Hey folks - I'm not much of a coder, but I need to write a simple preg_replace statement in PHP that will help me with a wordpress plugin. Basically I need code that will search for a string, pull out the video id, and return the embed code with the video id inserted into it.

So in other words...

I'm searching for this:
[youtube=http://www.youtube.com/watch?v=VIDEO_ID_HERE&hl=en&fs=1]

And want to replace it with this (keeping the video id the same):
param name="movie" value="http://www.youtube.com/v/VIDEO_ID_HERE&hl=en&fs=1&rel=0

If possible, I'd be forever grateful if you could explain how you've used the various slashes, carrots, and stars in the search pattern, ie, translate it from grep to English so I can learn. :-)

Thanks!
Mike

+2  A: 
$str = preg_replace('/\[youtube=.*?v=([a-z0-9_-]+?)&.*?\]/i', 'param name="movie" value="http://www.youtube.com/v/$1&hl=en&fs=1&rel=0', $str);

         /     - Start of RE
         \[    - A literal [  ([ is a special character so it needs escaping)
         youtube= - Make sure we've got the right tag
         .*?   - Any old rubbish, but don't be greedy; stop when we reach...
         v=    - ...this text
         ([a-z0-9_-]+?) - Take some more text (just z-a 0-9 _ and -), and don't be greedy.  Capture it using ().  This will get put in $1
         &.*?\] - the junk up to the ending ]
         /i - end the RE and make it case-insensitive for the hell of it
Greg
Not to be nitpicky, but shouldn't you make it search for only alphanumeric/underscores as the value of v? That way people can't get stupid in there.
Paolo Bergantino
Nice explanation of the regex atoms. Still, Paolo is correct - this pattern should not accept any-and-all characters as part of a youtube video id.
Peter Bailey
A: 
$embedString = 'youtube=http://www.youtube.com/watch?v=VIDEO_ID_HERE&hl=en&fs=1';
preg_match('/v=([^&]*)/',$embedstring,$matches);
echo 'param name="movie" value="http://www.youtube.com/v/'.$matches[1].'&hl=en&fs=1&rel=0';

Try that.

The regex /v=([^&]*)/ works this way:

  • it searches for v=
  • it then saves the match to the pattern inside the parentheses to $matches
  • [^&] tells it to match any character except the ampersand ('&')
  • * tells it we want anywhere from 0 to any number of those characters in the match
Lucas Oman
A: 

A warning. If the text after .*? isn't found immediately, the regex engine will continue to search over the whole line, possibly jumping to the next [youtube...] tag. It is often better to use [^\]]*? to limit the search inside the brackets.

Based on RoBorgs answer:

$str = preg_replace('/\[youtube=[^\]]*?v=([^\]]*?)&[^\]]*?\]/i', ...)

[^\]] will match any character except ']'.

MizardX
+6  A: 

BE CAREFUL! If this is a BBCode-style system with user input, these other two solutions would leave you vulnerable to XSS attacks.

You have several ways to protect yourself against this. Have the regex explicitly disallow the characters that could get you in trouble (or, allow only those valid for a youtube video id), or actually sanitize the input and use preg_match instead, which I will illustrate below going off of RoBorg's regex.

<?php

$input = "[youtube=http://www.youtube.com/watch?v=VIDEO_ID_HERE&amp;hl=en&amp;fs=1]";

if ( preg_match('/\[youtube=.*?v=(.*?)&.*?\]/i', $input, $matches ) )
{
    $sanitizedVideoId = urlencode( strip_tags( $matches[1] ) );
    echo 'param name="movie" value="http://www.youtube.com/v/' . $sanitizedVideoId . '&hl=en&fs=1&rel=0';
} else {
    // Not valid input
}

Here's an example of this type of attack in action

<?php

$input = "[youtube=http://www.youtube.com/watch?v=\"&gt;&lt;script src=\"http://example.com/xss.js\"&gt;&lt;/script&gt;&amp;hl=en&amp;fs=1]";

//  Is vulnerable to XSS
echo preg_replace('/\[youtube=.*?v=(.*?)&.*?\]/i', 'param name="movie" value="http://www.youtube.com/v/$1&amp;hl=en&amp;fs=1&amp;rel=0', $input );
echo "\n";

//  Prevents XSS
if ( preg_match('/\[youtube=.*?v=(.*?)&.*?\]/i', $input, $matches ) )
{
    $sanitizedVideoId = urlencode( strip_tags( $matches[1] ) );
    echo 'param name="movie" value="http://www.youtube.com/v/' . $sanitizedVideoId . '&hl=en&fs=1&rel=0';
} else {
    // Not valid input
}
Peter Bailey
+1  A: 

I would avoind regular expressions in this case if at all possible, because: who guarantees that the querystring in the first url will always be in that format?

i'd use parse_url($originalURL, PHP-URL-QUERY); and then loop through the returned array finding the correct 'name=value' pair for the v part of the query string: something like:

$originalURL = 'http://www.youtube.com/watch?v=VIDEO_ID_HERE&amp;hl=en&amp;fs=1';

foreach( parse_url( $originalURL, PHP_URL_QUERY) as $keyvalue )
{
    if ( strlen( $keyvalue ) > 2 && substr( $keyvalue, 0, 2 ) == 'v=' )
    {
        $videoId = substr( $keyvalue, 2 );
        break;
    }
}

$newURL = sprintf( 'http://www.youtube.com/v/%s/whatever/else', url_encode( $videoId ) );

p.s. written in the SO textbox, untested.

Kris