I have a form which accepts text input. I would like it to be able to accept characters such as & and ; and > and <, which are useful characters for the data being supplied by the user. I want the user to, for example, be able to say
The ampersand (&) is encoded as & (and I see from the preview that I can't even do that here - it should look like The ampersand (&) is encoded as & but I had to type in amp;amp; after the ampersand to get that to look right.) (btw, the preview is cool, but I can't count on users having scripts enabled)
I parse the data, and if there is a problem with it, I present the user's entry back to the user, in the same form, prefilled in the same field, for editing and resubmission.
If I present the raw data, I run the risk of having hostile input (such as scripts or HTML) executed by the browser. However, if I filter it (such as via htmlspecialcharacters), then the user would see (a representation of) the character he had typed (say, the ampersand), but when he re-submits, he will =actually= be submitting the replacement (in this case what looks like &), which as it turns out even contains an ampersand. If there is still a problem with the input, it will be presented again for editing, and we'll be another level deep in replacements.
User data is accepted only when what the user actually submits is identical to the sanitized version of the data. It is destined for a text file on the server, and an Email sent to the organization behind the website.
I suppose the "question that can be answered" is "is this even possible?"
Jose
edit:
<?php
$var=$_GET["test2"];
?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1" http-equiv="content-type">
<title>Input Escape Test</title>
</head><body>
The php parser would store the following input:<br>
<?php echo $var ?>
<br>
<form method="get" action="test.php"><p>
<label for "test2">Test - question five: <br>type in a character on the first line<br>and its HTML entity on the second line.
<textarea name="test2" cols="50" rows="3"><?php echo $var; ?></textarea><br/>
<input type="submit"/>
</p></form>
</body></html>
results in a form where the user attempts to answer the question with ampersand ampersand a m p semicolon. IF that gets rejected (say, because of other illegal characters), the user is presented with his input back, minus the stripped characters. However, the a m p semicolon is also stripped from view (though it's in the source). The user will then attempt to add another a m p semicolon to the displayed result.
The only way the user gets to see ampersand a m p semicolon displayed (upon rejected input), is to type in ampersand a m p semicolon a m p semicolon
Finally satisfied, the user clicks submit again, and the a m p semicolon seemingly disappears again. The user doesn't know what his (submitted) answer will be stored as.
I want the user to be able to type in: ampersand a m p semicolon and, upon rejection, see ampersand a m p semicolon and upon acceptance, store ampersand a m p semicolon
Jose