views:

142

answers:

3

Hi,

I'm building a PHP intranet for my boss. A simple customer, order, quote system. It will be denied access from the Internet and only used by 3 people. I'm not so concerned with security as I am with validation. Javascript is disables on all machines.

The problem I have is this:

  1. Employee enters valid data into a form containing any of the following :;[]"' etc.
  2. Form $_POSTS this data to a validationAndProcessing.php page, and determines whether the employee entered data or not in to the fields. If they didn't they are redirected back to the data input page and the field they missed out is highlighted in red.
  3. htmlspecialchars() is applied to all data being re-populated to the form from what they entered earlier.
  4. Form is then resubmitted to validationAndProcessing.php page, if successful data is entered into the database and employee is taken to display data page.

My question is this:

If an employee repeatedly enters no data in step 1, they will keep moving between step 1 and 4 each time having htmlspecialchars() applied to the data.

So that:- &
becomes:- &
becomes:- &
becomes:- &

etc..

How can I stop htmlspecialchars() being applied multiple times to data that is already cleaned?

Thanks, Adam

A: 

So that:- & becomes:- & becomes:- & becomes:- & etc..

You are wrong. Try it and see

<form>
<input name="a" value="<?php echo htmlspecialchars($_GET["a"])?>">
<input type=submit>
</form>

GO TRY IT AND SEE

Col. Shrapnel
@Col. Shrapnel:- I'm not wrong, well I am wrong because I'm not doing the right thing, but this is what happens, I keep encoding the encoding. :D
@naescent so, you're doing it twice somehow. Search your code.
Col. Shrapnel
@Col. Shrapnel, I know where I'm doing it. When I reinput the values back into the form I use htmlspecialchars(), which is fine first time round. Going back into validateAndProcessing.php it is passed back and then htmlspecialchars() is applied again, so I don't know how to not do it a second time..?
sheesh. @naescent why are you do dumb? You re doing it twice in the SAME script! Go make a TEST form and see. Your text will remain the same, no matter how many rounds. I am not going to spend my life trying to convince you in the obvious things! Go try it yourself! After that you'd better start to search second htmlspecialchars call in your scripts.
Col. Shrapnel
Don't call me dumb, show a little respect. I know exactly what i'm doing. If YOU READ the question you'd realise that I knew the mistake I was making. I wanted to know how to not do it twice but still make sure that the data was still properly sanitised.If the employee changed the data in any of the fields to include any special html chars I would still need to apply htmlspecialchars() on the changed data. The problem is my data flow, and I dont know any best practice.I started learning PHP a few days ago. Next time I ask a question I'll make sure I know everything before posting.
@naescent How can I have a respect to such a dumb person? Run a code from my update and tell me does it change even a bit of your data .
Col. Shrapnel
@Col as so often, I think you were the only one to be technically correct from the start (I didn't grasp the situation closely enough, it's likely there is no double-encoding problem at all), but you are being *such* an asshole while at it that it cancels out everything positive. Why is that necessary? You are good at what you do but you are way too rude.
Pekka
Dunno. I just lose my temper too fast. You see that's another dumb (as a matter of fact) question from wrong assumptions. VERY BASIC one and I have to argue it out!
Col. Shrapnel
I don't think I an *that* asshole. I call myself dumb pretty often as well. Why everyone around are too sensitive? If you act dumb - go think how to repair it, not how you have been insulted.
Col. Shrapnel
@Col Still, the overall tone on SO is very friendly and polite, and very lenient towards (possible) mistakes and false assumptions. It's the spirit of the site, and looking at the numbers it seems to be working well.
Pekka
Numbers always lie. especially on this site. You have read my post on meta, I won't double it. numbers encourage ignorance. Of course very polite and friendly ignorance
Col. Shrapnel
Politeness is not ignorance. It is perfectly possible to explain things clearly in a friendly way. Had you added a few explaining sentences to your initial answer (yes, even though one could find this out by oneself through trying it out!), it would have been clear from the start.
Pekka
I don't say Politeness == ignorance. That's different matters. I just say there is ignorance around but noone concerned in that. Only in politeness. Anyway thanks for cooling me out. I was wrong with this sudden anger. @naescent I am sorry
Col. Shrapnel
@Col. Shrapnel, thanks I appreciate it.
@Pekka, thanks for the support.
+1  A: 

Check the manual page on htmlspecialchars:

string htmlspecialchars ( string $string [, int $quote_style = ENT_COMPAT [, string $charset [, bool $double_encode = true ]]] )

the $double_encode option should be what you are looking for.

In a properly set up data flow, though, this shouldn't be a possibility at all, except if there is data incoming from the user or a 3rd party service that could or could not already contain HTML encoded characters. (Not that I haven't built a few improperly set up data flows in my career. But that's why I know why it's so important they're clean and well defined. :-)

Pekka
bobince
@bobince step 3 it the only possible way according to standards :)
Col. Shrapnel
Of course you need `htmlspecialchars` *when outputting text into HTML*. However step 3 makes it sound like it is being blanket-applied to all content outside of the HTML-output step, which would certainly explain the double-escaping.
bobince
@bobince he doesn't output text into HTML. He fills form's input values. At least he should :)
Col. Shrapnel
@Col. Shrapnel, that's exactly what I'm doing, which is why I'm using htmlspecialchars(), but finding myself in a bind. I'm sure I'm doing something that doesn't match standards but i dont know what@bobince and @col. shrapnel, looks like i need to look at my data flow. Do you have any ideas on where I can find material on how to do this? I've looked about but I'm not sure.
Pekka
A: 

You should only be using htmlspecialchars in the HTML output, never anywhere else.

<input name="var" value="<?php echo htmlspecialchars($var)?>">

If $var contained an ampersand, say, then in the HTML it would output the encoded value:

<input name="var" value="this&amp;that">

However, the user would only see this&that in their input field, and upon submission, $_GET['var'] will be this&that, not the encoded version.

On the PHP side of things the only thing you may want to do is remove slashes if magic quotes are on:

if (get_magic_quotes_gpc())
    $var = stripslashes($_POST['var']);
else
    $var = $_POST['var'];

From there you should store the raw data in the database, not HTML-encoded versions. To avoid SQL injection, use mysql_real_escape_string if you're using normal mysql functions, or use PDO instead.

DisgruntledGoat