You cannot insert user-provided data into an HTML document without encoding it first. Your goal is to ensure that the structure of the document cannot be changed and that the data is always treated as data-values and never as HTML markup or Javascript code. Attacks against this mechanism are commonly known as "cross-site scripting", or simply "XSS".
If inserting into an HTML attribute value, then you must ensure that the string cannot cause the attribute value to end prematurely. You must also,of course, ensure that the tag itself cannot be ended. You can acheive this by HTML-encoding any chars that are not guaranteed to be safe.
If you write HTML so that the value of the tag's attribute appears inside a pair of double-quote or single-quote characters then you only need to ensure that you html-encode the quote character you chose to use. If you are not correctly quoting your attributes as described above, then you need to worry about many more characters including whitespace, symbols, punctuation and other ascii control chars. Although, to be honest, its arguably safest to encode these non-alphanumeric chars anyway.
Remember that an HTML attribute value may appear in 3 different syntactical contexts:
Double-quoted attribute value
<input type="text" value="**insert-here**" />
You only need to encode the double quote character to a suitable HTML-safe value such as "
Single-quoted attribute value
<input type='text' value='**insert-here**' />
You only need to encode the single quote character to a suitable HTML-safe value such as ‘
Unquoted attribute value
<input type='text' value=**insert-here** />
You shouldn't ever have an html tag attribute value without quotes, but sometimes this is out of your control. In this case, we really need to worry about whitespace, punctuation and other control characters, as these will break us out of the attribute value.
Except for alphanumeric characters, escape all characters with ASCII values less than 256 with the &#xHH;
format (or a named entity if available) to prevent switching out of the attribute. Unquoted attributes can be broken out of with many characters, including [space]
%
*
+
,
-
/
;
<
=
>
^
and |
(and more). [para lifted from OWASP]
Please remember that the above rules only apply to control injection when inserting into an HTML attribute value. Within other areas of the page, other rules apply.
Please see the XSS prevention cheat sheet at OWASP for more information