views:

58

answers:

2

I'm using Catalyst for my Perl web app. What is the accepted way of removing HTML from user input?

Currently I'm leaning towards using HTML::FormatText. But it seems strange to me that I can't find a utility built into Catalyst to do this common task. Have I just not found it? Also, it seems these modules for removing input take like 5 lines of code. I was hoping for a simple "deHTMLify()" method. I guess I can roll my own but didn't want to reinvent the wheel.

I think the form validation modules like HTML::FormFU do this for you, but I am hoping to avoid that complexity. My forms are short and simple. Is this decision wrong headed?

Am I doing it right?

+4  A: 

I'm using HTML::Scrubber, but that's where I want to actually allow a subset of elements/attributes.

ysth
I wish stackoverflow allowed multiple green checkmarks because I ended up using both of your solutions in different scenarios.
Eric Johnson
+5  A: 

I'd argue you aren't doing it right. The right way to do this is to accept the text as sent from the server. Then to process all values received from the database with the html, or html_entities filter in your view (probably TT). Why is this the right way? Well, if you don't want to support HTML now, you can still hack the filter to make a subset of HTML work later on. This also lets the user see their input -- just escaped -- rather than having it stripped thereby losing track of what they sent and some potentially valuable information.

Your way also seems to make some assumptions about the output mechanism (HTML), that I'm uncomfortable with. Why would you want to sanitize on input for just one output format?

Evan Carroll