tags:

views:

340

answers:

4

I'm trying to clean up form input using the following Perl transliteration:

sub ValidateInput {
 my $input = shift;
 $input =~ tr/a-zA-Z0-9_@.:;',#$%&()\/\\{}[]?! -//cd;
 return $input;
}

The problem is that this transliteration is removing embedded newline characters that users may enter into a textarea field which I want to keep as part of the string. Any ideas on how I can update this to stop it from removing embedded newline characters? Thanks in advance for your help!

+1  A: 

That's a bit of a byzantine way of doing it! If you add \012 it should keep the newlines.

$input =~ tr/a-zA-Z0-9_@.:;',#$%&()\/\{}[]?! \012-//cd;
martin clayton
It still is removing embedded newlines after the update. Any suggestions on other ways I can validate the user input to remove invalid characters and HTML that would be less byzantine?
Russell C.
Maybe you have \015s in there as part of CR_LF pairs if you're on an MS operating system.
martin clayton
Running on RedHat. What I've noticed is that I can keep newlines and returns but embedded newlines get removed. Pretty annoying since after the input is validated it no longer matches what's in the database even if nothing was updated. Not sure how to valid the input without running into this issue.
Russell C.
You could take a look at the raw data, maybe using 'warn unpack "H*", $input;' in your function, or run in the debugger. Perhaps there's something unexpected in there.
martin clayton
I'd add the humble tab character to Sinan's list of candidates for inclusion in your list - octal 11 :)
martin clayton
+1  A: 

See Form content types.

application/x-www-form-urlencoded: Line breaks are represented as "CR LF" pairs (i.e., %0D%0A).

...

multipart/form-data: As with all MIME transmissions, "CR LF" (i.e., %0D%0A) is used to separate lines of data.

I do not know what you have in the database. Now you know what your script it sees.

You are using CGI.pm, right?

Sinan Ünür
A: 

Thanks for the help guys! Ultimately I decided to process all the data in our database to remove the character that was causing the issue so that any text that was submitted via our update form (and not changed by the user) would match what was in the database. Per your suggestions I also added a few additional allowed characters to the validation regex.

Russell C.
+3  A: 

I'm not sure what you are doing, but I suspect you are trying to keep all the characters between the space and the tilde in the ASCII table, along with some of the whitespace characters. I think most of your list condenses to a single range \x20-\x7e:

$string =~ tr/\x0a\x0d\x20-\x7e//cd;

If you want to knock out a character like " (although I suspect you really want it since you allow the single quote), just adjust your range:

$string =~ tr/\x0a\x0d\x20-\xa7\xa9-\x7e//cd;
brian d foy