Why is twitter double encoding XML entity references?
Here's an example tweet:
xml entity ref test < & '
The response from statuses/friends_timeline:
<status>
<created_at>Wed Jun 24 00:16:15 +0000 2009</created_at>
<id>2302770346</id>
<text>xml entity ref test &lt; & '</text>
<source>web</source>
<truncated>false</truncated>
shouldn't it be
< & '
I did some more test, here's what happens in the http post to send the update:
sniff again < & '
post data:
authenticity_token=secret_sauce_removed&status=sniff+again+%3C+%26+'&twttr=true&return_rendered_status=true
I've confirmed Justin's observation that only < > is double encoded. First line is the xml repsonse, 2nd line json.
<text>" & ' &lt; &gt;</text>
"text":"\" & ' < >"
Twitter documentation says "escaped and HTML encoded status body", I guess escaped means xml encoding < >.
But i still don't understand why they're doing it. No web pages are involved in the whole process. The tweet is sent through the rest API url-encoded, and it is retrieved as xml or json.