I'm using URL.openConnection()
to download something from a server. The server says
Content-Type: text/plain; charset=utf-8
But connection.getContentEncoding()
returns null
. What up?
I'm using URL.openConnection()
to download something from a server. The server says
Content-Type: text/plain; charset=utf-8
But connection.getContentEncoding()
returns null
. What up?
The value returned from URLConnection.getContentEncoding()
returns the value from header Content-Encoding
Code from URLConnection.getContentEncoding()
/**
* Returns the value of the <code>content-encoding</code> header field.
*
* @return the content encoding of the resource that the URL references,
* or <code>null</code> if not known.
* @see java.net.URLConnection#getHeaderField(java.lang.String)
*/
public String getContentEncoding() {
return getHeaderField("content-encoding");
}
Instead, rather do a connection.getContentType()
to retrieve the Content-Type and retrieve the charset from the Content-Type. I've included a sample code on how to do this....
String contentType = connection.getContentType();
String[] values = contentType.split(";"); //The values.length must be equal to 2...
String charset = "";
for (String value : values) {
value = value.trim();
if (value.toLowerCase().startsWith("charset=")) {
charset = value.substring("charset=".length()));
}
}
if ("".equals(charset)) {
charset = "UTF-8"; //Assumption....it's the mother of all f**k ups...lol
}
This is documented behaviour as the getContentEncoding()
method is specified to return the contents of the Content-Encoding
HTTP header, which is not set in your example. You could use the getContentType()
method and parse the resulting String on your own, or possibly go for a more advanced HTTP client library like the one from Apache, which might save you from having to deal with encoding issues at all.