tags:

views:

35

answers:

1

Hi,

We have an arabic website and we are trying to share a Url on face book. The Url looks like

http://www.website.com/ar/شاهدى-عروض-الأزياء-العالمية-بعيون-عربية/موضة/story/75

The problem is that the facebook does not get thumbnails present on the above link. When we debugged this through fiddler, we found that the url that facebook is trying to access is not the same as given above, this url is like

www.website.com/ar/%c3%98%c2%b4%c3%98%c2%a7%c3%99%e2%80%a1%c3%98%c2%af%c3%99%e2%80%b0-%c3%98%c2%b9%c3%98%c2%b1%c3%99%cb%86%c3%98%c2%b6-%c3%98%c2%a7%c3%99%e2%80%9e%c3%98%c2%a3%c3%98%c2%b2%c3%99%c5%a0%c3%98%c2%a7%c3%98%c2%a1-%c3%98%c2%a7%c3%99%e2%80%9e%c3%98%c2%b9%c3%98%c2%a7%c3%99%e2%80%9e%c3%99%e2%80%a6%c3%99%c5%a0%c3%98%c2%a9-%c3%98%c2%a8%c3%98%c2%b9%c3%99%c5%a0%c3%99%cb%86%c3%99%e2%80%a0-%c3%98%c2%b9%c3%98%c2%b1%c3%98%c2%a8%c3%99%c5%a0%c3%98%c2%a9/%c3%99%e2%80%a6%c3%99%cb%86%c3%98%c2%b6%c3%98%c2%a9/story/75

I need to know what facebook did to the url that it became as shown. One more thing that i know is that this url is not UTF8 encoded. If the given arabic url is converted to UTF8 then it looks like following and not as above

www.website.com/ar/%D8%B4%D8%A7%D9%87%D8%AF%D9%89-%D8%B9%D8%B1%D9%88%D8%B6-%D8%A7%D9%84%D8%A3%D8%B2%D9%8A%D8%A7%D8%A1-%D8%A7%D9%84%D8%B9%D8%A7%D9%84%D9%85%D9%8A%D8%A9-%D8%A8%D8%B9%D9%8A%D9%88%D9%86-%D8%B9%D8%B1%D8%A8%D9%8A%D8%A9/%D9%85%D9%88%D8%B6%D8%A9/story/75

So i need to know which encoding the face book is using or what facebook is doing to access the following url when we share the url

www.website.com/ar/شاهدى-عروض-الأزياء-العالمية-بعيون-عربية/موضة/story/75

+2  A: 
http://www.website.com/ar/شاهدى-عروض-الأزياء-العالمية-بعيون-عربية/موضة/story/75

That's not a URI (or URL). It's an IRI. Unfortunately a lot of software doesn't support IRI directly (including SO, as you can see from the way it has linked only the first part of the address!).

So if you want the link to work everywhere you'll have to write it up as a plain URI with UTF-8-URL-encoded pathnames, as in the last example (%D8%B4...). Browser will usually present the encoded link in the address bar as a nice IRI regardless of the link in the HTML document being plain URI.

%c3%98%c2%b4... is what you get when you take bytes that are UTF-8 encoded and treat them as if they were ISO-8859-1-encoded (and then UTF-8-URL-encoding them again, giving a broken “double UTF-8”). How are you getting the IRI into Facebook? Either there's an interface you're using that you're sending UTF-8 but which expects ISO-8859-1, or it's just a plain old bug on Facebook's part. Either way, you'll have to use the URI version for now.

bobince

related questions