views:

698

answers:

2

Why can't htmlspecialchars() continually encode characters after each form submission? Take a look at the following example:

<?php $_POST['txt'] = htmlspecialchars($_POST['txt']); ?>
<form method="post">
<input name="txt" value="<?=$_POST['txt'] ?>" />
<input type="submit" name="save" value="test" />
</form>

You can see it at running at http://verticalcms.com/htmlspecialchars.php.

Now do the following

1) Type & into the text field
2) Hit the test button once
3) When the page completes post back, hit the test button again
4) When the page completes post back, view the page source code

In the input box, the value is & amp;

I was expecting & amp; amp;

Why is it not & amp; amp; ???

A: 

The values in $_POST are already html-decoded for convenience. So when your script starts, the following is true:

$_POST['txt'] == '&';
htmlspecialchars('&') == '&amp;'

[edit] Looks like this needs further explanation

When a form like the one above is submitted to the server by the browser with a single ampersand as the value of 'txt', it puts the following into the body of the request:

txt=&amp;

The value is encoded because the browser would concatenate multiple fields with an ampersand character like

txt=&amp;&user=soulmerge&pass=whatever

PHP takes the transmitted values and decodes them for the convenience of the programmer - it makes an ampersand out of & Now I though this was the reason for the question in the first place - guess I got it wrong. The actual question was answered correctly by Ferdinand.

soulmerge
"The values in $_POST are already html-decoded for convenience" -- Sorry, but this is not true. When you send HTML-encoded data via GET or POST, they will be available "as is" in your script, no automatical decoding is done.
Ferdinand Beyer
Guess I misunderstood the question ...
soulmerge
A: 

This simply is HTML entity encoding. When using "&" in an HTML attribute, it should be encoded. And this is what you are doing.

So, the browser reads

<input value="&amp;" />

and translates it to an "textbox widget with value '&'".

The same would be true for other special chars:

<input value="&quot" />

would result in a " character.

When you submit the form, the browser sends these values unencoded, therefore your PHP script receives it like "&", not "&amp;".

Ferdinand Beyer
So when the user hits the before inserting it into <input value=""/> ?
John
Ferdinand Beyer
ah i see, that makes a lot of sense now.
John
R. Bemrose
And this is also not true when using POST with "multipart/form-data" encoding :)
Ferdinand Beyer