tags:

views:

111

answers:

2

I'm trying to do some parsing in Java and I'm using Cobra HTML Parser to get the HTML into a DOM then I'm using XPath to get the nodes I want. When I get down to the desired level I call node.getTextContents(), but this gives me a string like

"\n\n\nValue\n-\nValue\n\n\n"

Is there a built in way to get rid of the line breaks? I would like to do a RegEx like

(?:\s*([^-]+)\s*-\s*([^-]+)\s*)

on the inner text and would really prefer not to have to deal with the possible different white space symbols in between the text.

Example Input:

Value
-
Value

Thanks

A: 

You can use String.replaceAll().

String trimmed = original_string.replaceAll("\n", "");

The first argument is a regular expression: you could replace all contiguous blocks of whitespace in the original string with replaceAll("\\s+", "") for instance.

Jim Ferrans
That's odd, it works for me.
Jim Ferrans
A: 

I'm not totally sure I understood the question correctly, but the simplest way to remove all the whitespace would be:

String s = node.getTextContents().replaceAll("\\s","");

If you just want to get rid of the leading/trailing whitespace, use trim().

mpobrien