views:

46

answers:

1

hi all, I'm trying to parse the title from the following webpage: http://kid37.blogger.de/stories/1670573/

When I use the apache.commons.lang StringEscapeUtils.escapeHTML method on the title element I get the following

Das hermetische Caf�: Rock & Wrestling 2010

however when I display that in my webpage with utf-8 encoding it just shows a question mark.

Using the following code:

String title = StringEscapeUtils.escapeHtml(myTitle);

If I run the title through this website: http://tools.devshed.com/?option=com_mechtools&tool=27 I get the following output which seems correct

TITLE:

<title>Das hermetische Café: Rock &amp; Wrestling 2010</title>

BECOMES (which I was expecting the escapeHtml method to do):

<title>Das hermetische Caf&eacute;: Rock &amp; Wrestling 2010</title>

any ideas? thanks

+2  A: 
erickson
you are correct I adjusted to iso-8859-1 and it processed correctly. much appreciated.