tags:

views:

48

answers:

1

Here is my function:

function is_url($url) {
    return (preg_match('#^(https?):\/\/#i', $url) && (filter_var($url, FILTER_VALIDATE_URL) !== FALSE));
}

And here is a nice url that it validates as true:

http://blah.com"onclick="alert(document.cookie)

Imagine if that goes into <a href="<?php echo $url; ?>">

Are there any better URL validators with regex? Or is the URL I am testing with actually a valid URL (in which case I would need an XSS clean up function)?

+1  A: 

There's this built-in filter:

filter_var($url, FILTER_VALIDATE_URL);

This will return false with your example URL. If it were valid, it would return $url. Example:

glopes@nebm:~$ php -r "var_dump(filter_var('http://blah.com\"onclick=\"alert(document.cookie)', FILTER_VALIDATE_URL));"
bool(false)

Anyway, the solution to prevent XSS is to use htmlspecialchars. Since it's an attribute, you should use ENT_QUOTES:

htmlspecialchars($data, ENT_QUOTES);

But you should also validate the URL, because otherwise the user can include javascript:-like "URLs".

Artefacto
read my code :-/
fire
Ah sorry, the line was too long and I didn't read that part. But anyway, it does return false.
Artefacto
That's strange what happens when you run http://pastebin.com/9Yvg7Gqn for me its dumping as bool(true) ??
fire
@fire Nop, gives `false` on both the latest 5.2 and 5.3: http://codepad.viper-7.com/gt5udu
Artefacto
Weird I am getting true on 5.2.6
fire
@fire Look for FILTER_VALIDATE_URL in http://php.net/ChangeLog-5.php
Artefacto
those changelog bastards!
fire
It was fixed around Christmas 2009 which means only releases after that date will do the validation on the host part of the URL: that means 5.2.13/5.3.2 or newer.
salathe