tags:

views:

66

answers:

2

i want how to get the content from websites with utf8 format,,

i have writing the following code is

try {
        String webnames = "http://pathivu.com";

        URL url = new URL(webnames);

        URLConnection urlc = url.openConnection();

        //BufferedInputStream buffer = new BufferedInputStream(urlc.getInputStream());
        BufferedReader buffer = new BufferedReader(new InputStreamReader(urlc.getInputStream(), "UTF8"));

        StringBuilder builder = new StringBuilder();

        int byteRead;

        while ((byteRead = buffer.read()) != -1)
            builder.append((char) byteRead);

        buffer.close();

        String text=builder.toString();

        System.out.println(text);
    } 
catch (IOException e) 
{
    e.printStackTrace();

}

but i cant get the correct format...

thanks and advance..

+1  A: 

Your code looks ok.. the problem here it will be that in server the data will not be in UTF-8 format..

sreejith
any solution for this problem?
zahir
please check in server from where u getting the contents.. can u provide more details on this
sreejith
i can easily getting the content if content in English,but other format that display like question mark or other symbols...what u want more details?
zahir
+2  A: 

The problem might be that your console or your System.out are not UTF-8.

  • Try writing this to a file instead
  • Set the console stream via System.setOut(..)

You may have to use -Dfile.encoding=utf-8 or OutputStreamWriter

Bozho
Or when sitting inside an IDE, configure workspace encoding. In Eclipse it's *Window > Preferences > General > Workspace > Text File Encoding*. This one needs to be set to UTF-8 (+1).
BalusC