How do I limit the types of HTML that a user can input into a textbox? I'm running a small forum using some custom software that I'm beta testing, but I need to know how to limit the HTML input. Any suggestions?
Once the text is submitted, you could strip any/all tags that don't match your predefined set using a regex in PHP.
It would look something like the following:
find open tag (<)
if contents != allowed tag, remove tag (from <..>)
Parse the input provides and strip out all html tags that don't match exactly the list you are allowing. This can either be a complex regex, or you can do a stateful iteration through the char[] of the input string building the allowed input string and stripping unwanted attributes on tags like
img
.Use a different code system (BBCode, Markdown)
Find some code online that already does this, to use as a basis for your implementation. For example Slashcode must perform this, so look for its implementation in the Perl and use the regexes (that I assume are there)
You didn't state what the forum was built with, but if it's PHP, check out:
Library Features: Whitelist, Removal, Well-formed, Nesting, Attributes, XSS safe, Standards safe
i'd suggest a slightly alternative approach:
- don't filter incoming user data (beyond prevention of sql injection). user data should be kept as pure as possible.
- filter all outgoing data from the database, this is where things like tag stripping, etc.. should happen
keeping user data clean allows you more flexibility in how it's displayed. filtering all outgoing data is a good habit to get into (along the never trust data meme).
Regardless what you use, be sure to be informed of what kind of HTML content can be dangerous.
e.g. a < script > tag is pretty obvious, but a < style > tag is just as bad in IE, because it can invoke JScript commands.
In fact, any style="..." attribute can invoke script in IE.
< object > would be one more tag to be weary of.
PHP comes with a simple function strip_tag to strip HTML tags. It allows for certain tags to not be stripped.
Example #1 strip_tags() example
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);
echo "\n";
// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
The above example will output:
Test paragraph. Other text
<p>Test paragraph.</p> <a href="#fragment">Other text</a>
Personally for a forum, I would use BBCode or Markdown because the amount of support and features provided such as live preview.