views:

238

answers:

6

I want to have users store the url in my database I'm using php mysql and htmlpurifier I was wondering if the following code was a good way to filter out bad data before I store it in the database?

Here is the Partial PHP code.

$url = mysqli_real_escape_string($mysqli, $purifier->purify(htmlspecialchars(strip_tags($_POST['url'])));
A: 

If you're worried about SQL Injection, this will clean it up and prevent that. Otherwise, not sure exactly what you're asking.

Shawn Steward
+4  A: 

You don't need to call htmlspecialchars() and the HTMLPurifier on the data - you've really only got one issue here and that's making sure the URL doesn't contain a SQL injection - mysqli_real_escape_string() will sort that.

Alternatively, if you're outputting the data to a page/HTML (instead of using it as HTTP redirect headers) you'll need to use htmlentities() to protect against XSS on the data WHEN YOU OUTPUT IT. The golden rule is context awareness:

HTML entity encoding is okay for untrusted data that you put in the body of the HTML document, such as inside a tag. It even sort of works for untrusted data that goes into attributes, particularly if you're religious about using quotes around your attributes. But HTML entity encoding doesn't work if you're putting untrusted data inside a tag anywhere, or an event handler attribute like onmouseover, or inside CSS, or in a URL. So even if you use an HTML entity encoding method everywhere, you are still most likely vulnerable to XSS. You MUST use the escape syntax for the part of the HTML document you're putting untrusted data into.

For an in-depth reference to XSS prevention, check out OWASP.

It's always best to encode the data (against the relevant attack) just before it's used (i.e. MySQL escape strings for input into database to prevent SQLi, HTML escape strings for output to screen to prevent XSS, not both at the same time). This allows you to keep track of the flow of data through your application, and you know that all data in the database is ready for any purpose. If you HTML encode this data before putting it into the DB, you'll have to un-encode it before using it as a HTTP header, for example.

If you must encode the data before it goes into the database, make sure the column name reflects this for future developers/maintainers!

EDIT:

As per VolkerK's comment, the best way to prevent XSS in URL output would be to check the protocol - if it doesn't match your allowed protocols (probably http/https) reject it:

$url = 'http://hostname/path?arg=value#anchor';

$parsedUrl = parse_url( $url );

if( $parsedUrl['scheme'] != 'http' ) {
    // reject URL
} else {
    $url = mysqli_real_escape_string( $mysqli, $url );
    $sql = "INSERT INTO table (url) VALUES ('$url')";
    // insert query
}

This has the advantage of preventing javascript:alert('xss') attacks in <a href="$url"> situations. Running htmlentities() on javascript:alert('xss') has no affect (as the limited subset of characters such as <> are not present to be escaped), so a malicious user would be able to execute JS on your domain.

Andy
What about $url being `javascript:alert("hello")` ? It's not about text/content but the URL/URI/URN that is the value of an attribute. So htmlenties/htmlspecialchars() is not a safe choice.
VolkerK
@VolkerK that would only be executed in a window.location redirect (i.e. in Javascript tags) or when set as a HTML tag parameter, not when output as HTML content or as part of an HTTP header. Granted OP doesn't specify usage of the data so all bases should be covered by an answer. XSS prevention is not in question when storing data in the DB, it's relevant when output encoding. OWASP reference added from http://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet
Andy
Anyway, you opened the door with "you'll need to use htmlentities()" which I believe doesn't do the (whole) job if the url is put e.g. in an href attribute. I'd check the schema and limit it to http, https maybe ftp, whatever is reasonable. And that's something you can do prior to storing the value in the database. $url=`skype:whatever`->error (unless that's exactly what you want to allow of course ;-)).
VolkerK
@VolkerK you are correct, yours is the correct solution to prevent this attack, answer updated.
Andy
+1  A: 

Only mysqli_real_escape_string() is necessary before you put the value in the database.

To secure against XSS, htmlspecialchars() should be called on all data you display when it is displayed (in HTML), not before it is stored.

Imagine that you might one day need to output the data in a format other than HTML; then you will regret having called htmlspecialchars() on everything before it was stored in your database.

I don't know why you're using purifier and strip_tags(), perhaps you have a particular reason but it just looks like overkill along the lines of "the more layers of data cleaning I use, the better". htmlspecialchars() will make any HTML harmless anyway.

Ben James
A: 

If you only need to store the URLs in the database and get them back, just passing the string through mysqli_real_escape_string once will suffice. No need for anything else. You will only need to use htmlspecialchars when you want to output it back to a user in a HTML page.

HTMLPurifier does not belong here at all, since it cleans HTML, not URLs.

Ilia Jerebtsov
A: 

I store urls in my db and it is only xss cleaned, but automatically escaped by an orm. This essentially boils down to:

mysqli_real_escape_string(strip_tags($var));

So you clean it first, then make sure it's ok to save in the db, then save the unaltered url.

On it's way out however you would pull it and then in your template use htmlentities() just in case there was anything in it that may interfere with html.

You don't really need to over engineer something like this.

PS. if you're working with POST you could do:

$fixed = $_POST;
$fixed = array_map('strip_tags', $fixed);
$fixed = array_map('mysqli_real_escape_string', $fixed);

Then work with the $fixed array.

jmoz
A: 

You need to differentiate between filtering and escaping. When data is received one should have it filtered. That means stripping out bad characters and flawed data.

Transfer your data from the superglobal arrays to application variables, using the filter extension, and appropriate filters.

After that point you might want to empty $_POST to avoid using the unfiltered data again.

Then you'll massage it into the shape you want.

Escaping is the step just before data is being sent or saved, and should be done according to the output medium.

Escape with mysqli_real_escape_string for storing in a MySQL DB. If you use another DBMS, you should use another function.

Better yet, use prepared statements and placeholders for user data.

When outputting as HTML, htmlentities might be wise. Provided you do not want to have any HTML code at all in the data. If you do, use a trusted library like HTMLPurifier.

I recommend that you do not try to do everything in one line of code, as in your example. That is harder to read and harder to maintain.

For URLs, htmlentities is a good way to avoid unescaped & characters. They should of course be & It also makes sure you are not using any quotation marks, since you of course use the flag to have those quoted as well - right?

But this function should not be applied when storing the data. It should be applied when it is being sent to the user.

Filter input - escape output! (Quoting Chris Schifflett)

itpastorn