I'm tryng to download a web page in java with the following:
URL url = new URL("www.jksfljasdlfas.com");
FIle to = new File("/home/test/test.html");
Reader in = new InputStreamReader(url.openStream(), "UTF-8");
Writer out = new OutputStreamWriter(new FileOutputStream(to), "UTF-8");
int c;
while((c = in.read()) != -1){
out.write(c);
}
in.close();
out.close();
I download the page and some character are replaced by entities:
this:
<a href="http://www.generation276.org/film/?m=200812&paged=2" >Pagina successiva »</a>
become this:
<a href="http://www.generation276.org/film/?m=200812&#038;paged=2" >Pagina successiva »</a>
Downloading the same page with Chrome, the & remains &.
I'm new in Charset/encoding; can anybody understand the probem?