ansaurus

Question

How to find if String contains html data?

Answer 1

+2 A:

You can use regular expressions to search for HTML tags.

Tom Gullen 2010-06-16 09:28:17

Ah, good old problem #2. Tom is correct, regex is the most direct way to get the job done, and there are usually lots of examples online to help you get going.

Alex Larzelere 2010-06-16 12:01:46

@Alex Larzelere: problem #2? Can you explain? Is this an xkcd reference ("now you've got two problems"), or something else?

CPerkins 2010-06-16 12:47:32

@Cperkins that's it exactly. Ol' problem #2, problem #1 of course is whatever you were trying to do originally.

Alex Larzelere 2010-06-16 13:48:44

Answer 2

+2 A:

In your backing bean, you can try to find html tags such as <b> or <i>, etc... You can use regular expressions (slow) or just try to find the "<>" chars. It depends on how sure you want to be that the user used html or not.

Keep in mind that the user could write <asdf>. If you want to be 100% sure that the html used is valid you will need to use a complex html parser from some library (TidyHTML maybe?)

pakore 2010-06-16 09:29:12

Answer 3

A:

You have to get help only by the regular expression strings. They help you find out potential html tags. You can then compare the inner to contain any html keywords. If its found, put up an alert telling not to use HTML. Or simply delete it if you feel otherwise.

1s2a3n4j5e6e7v 2010-06-16 09:36:23

Answer 4

+1 A:

If you don't want the user to have HTML in their input, you can replace all '<' characters with their HTML entity equivalent, '& lt;' and all '>' with '& gt;' (no spaces between & and g)

Tom Gullen 2010-06-16 09:39:46

Answer 5

+2 A:

What would you like to do with this information? Show a validation error to the user that s/he shouldn't enter HTML? For that regex may suffice. Basically you just need to check if the string contains the pattern <sometag...>

boolean containsHTML = value.matches(".*\\<[^>]+>.*");

Or do you want to remove all HTML? Then regex is unsuitable since it can't reliably remove/replace the real HTML. Better use a HTML parser like Jsoup. It's then as easy as:

String text = Jsoup.parse(value).text();

You can if necessary also compare afterwards to see if it contained HTML:

boolean containedHTML = !text.equals(value);

Alternatively, if your whole concern was XSS, then you can also just ignore this all and redisplay the text in an escaped form. E.g. the < will be escaped as <, the > as > and thus get displayed as-is in the final HTML. The JSF <h:outputText> component already does that by default:

<h:outputText value="#{bean.value}" />

BalusC 2010-06-16 11:38:52

ansaurus

tags:

views:

answers:

How to find if String contains html data?

related questions