A: 

If you just need to reverse the encode then you can use html_entity_decode - http://www.php.net/manual/en/function.html-entity-decode.php.

Another possibility to is only run htmlentities at the time the content will be displayed as part of a web page. Otherwise, keep the unencoded text, as submitted or loaded from your datastore.

Frank
A: 

Why not just use htmlspecialchars?

Ramuns Usovs
oezi
+2  A: 

I'm sorry but I cannot reproduce the behaviour you describe. I've always used htmlspecialchars() (which does essentially the same task as htmlentities()) and it's never lead to any sort of double-encoding. The page source shows déjà vu in both places (of course! that's the point!) but the rendered page shows the appropriate values and that's what sent back to the server.

Can you post a full self-contained code snippet that exhibits such behaviour?

Update: some testing code:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"&gt;
<html>
<head><title></title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body>

<?php

$default_value = 'déjà vu <script> ¿foo?';

if( !isset($_GET['foo']) ){
    $_GET['foo'] = $default_value;
}

?>

<form action="" method="get">
    <p><?php echo htmlentities($_GET['foo']); ?></p>
    <input type="text" name="foo" value="<?php echo htmlentities($_GET['foo']); ?>">
    <input type="submit" value="Submit">
</form>

</body>
</html>

Answer to updated question

The htmlentities() function, as its name suggests, is used when generating HTML output. That's why it's of little use in your second example: JavaScript is not HTML. It's a language of its own with its own syntax.

Now, the problem you want to fix is how to generate output that follows these two rules:

  1. It's a valid string in JavaScript.
  2. It can be embedded safely in an HTML document.

The closest PHP function for #1 I'm aware of is json_encode(). Since JSON syntax is a subset of JavaScript, if you feed it with a PHP string it will output a JavaScript string.

As about #2, once the browser enters a JavaScript block it expects a </script> tag to leave it. The json_encode() function takes care of this and escapes it properly (<\/script>).

My revised test code:

<?php

$default_value = 'déjà vu </script> ¿foo?';

if( !isset($_GET['foo']) ){
    $_GET['foo'] = $default_value;
}

?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"&gt;
<html>
<head><title></title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"&gt;&lt;/script&gt;
<script type="text/javascript"><!--
$(function(){
    $("input[type=text]").val(<?php echo json_encode(utf8_encode($_GET['foo'])); ?>);
});
//--></script>
</head>
<body>


<form action="" method="get">
    <p><?php echo htmlentities($_GET['foo']); ?></p>
    <input type="text" name="foo" value="(to be replaced)">
    <input type="submit" value="Submit">
</form>

</body>
</html>

Nota: no utf8_encode() is required if your data is already in UTF-8.

Álvaro G. Vicario
I'm actually using json_encode for this purpose elsewhere in my code, go figure! Thanks!
Tesserex
A: 

I believe it is a problem with the way you are applying the value towards the input. It is being displayed as encoded, which makes sense because it is Javascript, not HTML. So, what I would propose is to write your encoded text as part of the markup so that it gets parsed naturally (as opposed to being injected with client script). Since your textboxes are not readily available when the server is responding, you can use a temporary hidden field...

<input type="hidden" id="hidEncoded" value="<?=htmlentities("déjà vu");?>" />

Then it will get parsed as good old HTML, and when you try to access the value with Javascript it should be decoded...

// Give your textbox an ID!
$("#txtInput").val($("#hidEncoded").val());
Josh Stodola
Well that's essentially what my given solution was. I just used jQuery to create the hidden element right before putting the value in the text box. You don't even need to append the temp element to the DOM for it to work. I just felt like that might still have security holes or not be the optimal method.
Tesserex
@Tesserex I think it is suboptimal in comparison to this because with your solution client script is still doing the injection. That increases chances of malicious script injection (because the entire input string will pass through the Javascript interpreter). With my solution, the HTML string is included as part of a natural HTML response (and is within an attribute, so encoding is necessary and expected), and then Javascript pulls it out after the fact. This is cleaner, IMO.
Josh Stodola