views:

155

answers:

5

Hey all

Are there any pre-made scripts that I can use for PHP / MySQL to prevent server-side scripting and JS injections?

I know about the typical functions such as htmlentities, special characters, string replace etc. but is there a simple bit of code or a function that is a failsafe for everything?

Any ideas would be great. Many thanks :)

EDIT: Something generic that strips out anything that could be hazardous, ie. greater than / less than signs, semi-colons, words like "DROP", etc?

I basically just want to compress everything to be alphanumeric, I guess...?

+1  A: 

No, there isn't. Risks depend on what you do with data, you can't write something that makes data safe for everything (unless you want to discard most of the data)

David Dorward
Well yeah, but we're not speaking so generically when it comes to the web and PHP. There are obviously certain characters and strings that you will want to disable, so I would like a fairly generic list of such things, if possible. I basically just want alphanumeric info.
Tim
Which alphabets? Do you want to forbid people from using hyphens and full stops? It is rarely that simple.
David Dorward
+4  A: 

Never output any bit of data whatsoever to the HTML stream that has not been passed through htmlspecialchars() and you're done. Simple rule, easy to follow, completely eradicates any XSS risk.

As a programmer it's your job to do it, though.

You can define

function h(s) { return htmlspecialchars(s); }

if htmlspecialchars() is too long to write 100 times per PHP file. On the other hand, using htmlentities() is not necessary at all.


The key point is: There is code, and there is data. If you intermix the two, bad things ensue.

In the case of HTML, code is elements, attribute names, entities, comments. Data is everything else. Data must be escaped to avoid being mistaken for code.

In case of URLs, code is the scheme, the host name, the path, the mechanism of the query string (?, &, =, #). Data is everything in the query string: parameter names and values. They must be escaped to avoid being mistaken for code.

URLs embedded in HTML must be doubly escaped (by URL-escaping and HTML-escaping) to ensure proper separation of code and data.

Modern browsers are capable of parsing amazingly broken and incorrect markup into something useful. This capability should not be stressed, though. The fact that something happens to work (like URLs in <a href> without proper HTML-escaping applied) does not mean that it's good or correct to do it. XSS is a problem that roots in a) people unaware of data/code separation (i.e. "escaping") or those that are sloppy and b) people that try to be clever about what part of data they don't need to escape.

XSS is easy enough to avoid if you make sure you don't fall into categories a) and b).

Tomalak
No, it doesn't. If you put it as a text node, then you are safe (assuming it isn't inside a script or style element). Attribute values on the other hand? `<img src="javascript:xss()">` (these days, most browsers protect against that, but there are other risks)
David Dorward
Ok cool, but what about when a user submits SQL into the URL? i.e. "DROP table users"
Tim
"Never output any bit of *user supplied* data" I'd say. Site templates may escape that peril :)
Col. Shrapnel
@Tim and so what?
Col. Shrapnel
@Tim: For SQL you ought to use paramterized queries (`mysqli_*` or PDO). Because *they* completely eraticate SQL injection.
Tomalak
@Col. Shrapnel: Since distinction between "user supplied" and "abstractly calculated" sometimes is fuzzy in complex application, it can't hurt to escape absolutly everything.
Tomalak
@Tomalak got a parameter for the field name for the dynamic sorting? ;)
Col. Shrapnel
@Col. Shrapnel: Special case, point made. However, this calls for whitelisting the possible values, which will put you on the safe side again.
Tomalak
I just have dig on the word "completely" :) Just good point to mention. I think.
Col. Shrapnel
htmlspecialchars is not safe to use alone. see http://stackoverflow.com/questions/2964424/to-htmlencode-or-not-to-htmlencode-user-input-on-web-form-asp-net-vb/2965444#2965444
Cheekysoft
A: 

To answer to your edition: everything except <> symbols has nothing to do with XSS.
And htmlspecialchars() can deal with them.

There is no harm in the word DROP table in the page's text ;)

Col. Shrapnel
And what about `"` to close an attribute and then continuing with your own code? `<img src="user-supplied-stuff" onerror="alert(document.cookie)">` seems to be rather XSS-y to me. :-)
janmoesen
@janm `htmlspecialchars` will catch double quotes so you should end up with `<img src="user-supplied-stuff" onerror="alert(document.cookie)">`. But with images, you should check that the image exists first anyway to avoid broken images.
DisgruntledGoat
this is simply not correct. see http://stackoverflow.com/questions/2964424/to-htmlencode-or-not-to-htmlencode-user-input-on-web-form-asp-net-vb/2965444#2965444
Cheekysoft
@DisgruntledGoat: my comment was about the erroneous statement "everything except <> symbols has nothing to do with XSS."
janmoesen
A: 

for clean user data use html_special_chars(); str_replace() and other funcs to cut unsafe data.

GOsha
`mysql_escape_string` is deprecated and doesn't handle character encoding properly.
David Dorward
ok. in that situation you can use str_replace to clean your request.
GOsha
str_replace is even worst. and get_magic_quotes_gpc has nothing to do with database escaping. and whole function concept is stupid - it does double escaping!
Col. Shrapnel
so, please, write a good function to prevent this troubles. I know you have such ))) If don't - it takes you 5 minutes)))
GOsha
to place some *data* into mysql query you can use `$data="'".mysql_real_escape_string($data)."'";` or binding. and some other rules on operators/identifiers. but it has nothing to do with XSS anyway
Col. Shrapnel
thank you. I know that it has nothing to do with XSS. it`s for me.
GOsha
A: 

is there a simple bit of code or a function that is a failsafe for everything?

No.

The representation of data leaving PHP must be converted / encoded specifically according where it is going. And therefore should only be converted/encoded at the point where it leaves PHP.

C.

symcbean