views:

137

answers:

0

Twitter encodes some entities. They say at least < and >. They also say

"When requesting XML, the response is UTF-8 encoded. Symbols and characters outside of the standard ASCII range may be translated to HTML entities."

However it's unclear to me whether symbols outside ASCII range are encoded just if you request XML, or also through JSON (like me). I am seeing &lt; and &gt;, but other entities (e.g., …, a.k.a. hellip, an example here) are coming through not entity encoded, but as UTF-8 characters I believe.

So, should I

a) Just decode &lt; to < and &gt; to >

b) a plus other basic XML entities (amp, quot, apos) (which would be Apache's unescapeXml)

c) b plus symbols outside of the ASCII range.

d) Hidden option d.

Also, what is the bulletproof Java code to do it?