You don't need to call htmlspecialchars()
and the HTMLPurifier
on the data - you've really only got one issue here and that's making sure the URL doesn't contain a SQL injection - mysqli_real_escape_string()
will sort that.
Alternatively, if you're outputting the data to a page/HTML (instead of using it as HTTP redirect headers) you'll need to use htmlentities()
to protect against XSS on the data WHEN YOU OUTPUT IT. The golden rule is context awareness:
HTML entity encoding is okay for
untrusted data that you put in the
body of the HTML document, such as
inside a tag. It even sort of
works for untrusted data that goes
into attributes, particularly if
you're religious about using quotes
around your attributes. But HTML
entity encoding doesn't work if you're
putting untrusted data inside a
tag anywhere, or an event
handler attribute like onmouseover, or
inside CSS, or in a URL. So even if
you use an HTML entity encoding method
everywhere, you are still most likely
vulnerable to XSS. You MUST use the
escape syntax for the part of the HTML
document you're putting untrusted data
into.
For an in-depth reference to XSS prevention, check out OWASP.
It's always best to encode the data (against the relevant attack) just before it's used (i.e. MySQL escape strings for input into database to prevent SQLi, HTML escape strings for output to screen to prevent XSS, not both at the same time). This allows you to keep track of the flow of data through your application, and you know that all data in the database is ready for any purpose. If you HTML encode this data before putting it into the DB, you'll have to un-encode it before using it as a HTTP header, for example.
If you must encode the data before it goes into the database, make sure the column name reflects this for future developers/maintainers!
EDIT:
As per VolkerK's comment, the best way to prevent XSS in URL output would be to check the protocol - if it doesn't match your allowed protocols (probably http/https) reject it:
$url = 'http://hostname/path?arg=value#anchor';
$parsedUrl = parse_url( $url );
if( $parsedUrl['scheme'] != 'http' ) {
// reject URL
} else {
$url = mysqli_real_escape_string( $mysqli, $url );
$sql = "INSERT INTO table (url) VALUES ('$url')";
// insert query
}
This has the advantage of preventing javascript:alert('xss')
attacks in <a href="$url">
situations. Running htmlentities()
on javascript:alert('xss')
has no affect (as the limited subset of characters such as <>
are not present to be escaped), so a malicious user would be able to execute JS on your domain.