tags:

views:

48

answers:

2

Hey guys quick question,

I want to filter my output to make it safer from Cross site scripting (XSS) attacks so I am filtering output with htmlentities. The problem is, I am trying to make my application utf8 compatible so when I enter something like ಠ_ಠ I would like it to be maintained when retrieved from the database. Is there a simple solution to achieve this? Thanks in advance for any advice.

+2  A: 

$var=htmlspecialchars($var,ENT_QUOTES,"UTF-8");

Rook
and that will still filter output the same wait entities would?
Scarface
Look at the PHP documentation for how htmlspecialchars and htmlentities differ.
The Pixel Developer
@Scarface No, its a lot better at stopping XSS than htmlentties(). There are cases where you don't need `<>`, for instance if you are in a body tag and you inject an `onload=`, and this will prevent that.
Rook
thanks rook, appreciate it
Scarface
+3  A: 

Three things

  1. HTML sanitization is an output escaping task, not input filtering. You should not do this task prior to storage, you should only do it prior to display.
  2. If you are trying to prevent XSS, you don't need to use htmlentities() - htmlspecialchars() is sufficient. htmlentities() is used only when trying to render a content from a character-encoding that is disparate from native encoding.
  3. Both functions accept a character encoding as the third argument.

So, finally:

echo htmlspecialchars( $content, ENT_QUOTES, 'UTF-8' );

Where if you used ENT_NOQUOTES you could be vulnerable to some types of XSS.

Peter Bailey
thanks a lot Peter, great information
Scarface
ಠ_ಠ ಠ_ಠ ಠ_ಠ ಠ_ಠ ಠ_ಠ
Scarface
-1 This doesn't stop all xss, quote marks are dangerous.
Rook
As @The Rook said, you may want to escape quotes in some circumstances (e.g. html attributes, javascript function arguments).
Artefacto
@Artefacto html ignores backslashes, its not like mysql. You have to encode them and your encode function is flawed. My -1 stands as long as your code is vulnerable.
Rook
@The Rook hum? htmlspecialchars encodes them in html entitities, doesn't escape them with backslashes.
Artefacto
@The Rook Ah sorry, when I wrote "escape" I meant "encode".
Artefacto
@The Rook - quote marks are only dangerous if you're writing user-into into the attribute of another HTML element. So using `ENT_NOQUOTES` vs `ENT_COMPAT` vs `ENT_QUOTES` is really more of a context-relevant *choice* than a "this one is always right" type of option. I just happened to pick `ENT_NOQUOTES` for this example (completely arbitrarily) and I find it disappointing that out of all the information in my answer, that is what you focused on, and voted with. Also, this is *my* answer, not Artefacto's.
Peter Bailey
I don't care if your vote stays. It's not my position, place, or right to talk you out of it. I'm just saying it's not *always* vulnerable. If you really feel *that* strongly about it - edit it. This is a wiki-like site after all.
Peter Bailey