My problem is with .Net Http/Uri libraries not being able to decode or unescape this character sequence: "Hi%E1". Neither Uri.UnescapeDataString nor HttpUtility.UrlDecode can do it.
Although I have a solution to get around this problem ( http://stackoverflow.com/questions/1221849/url-decoding-confusion ) I would like to understand why it is failing.
The 1st test here throws an exception! The second just fails.
Assert.That(Uri.UnescapeDataString("Hi%E1"), Is.EqualTo("Hiá"));
HttpUtility.UrlDecode("Hi%E1").ShouldBe("Hiá");
There is nothing in the docs to indicate that UnescapeDataString or UrlDecode are restricted to character sets or any reason why these tests would fail. However, from testing, it would appear that HttpUtility assumes UTF-8 (or some other) encoding.
The Java equivalent works! Probably because it allows an encoding to be set.
URLDecoder.decode("Hi%E1","windows-1252"); // this works btw, ie passes tests
Which looks like a very sensible move considering the .Net work-around (see URL above)
Are the .Net implementations of these methods just crap and .Net devs just have to write their own - or am I missing something?
BTW Everything I know of in IIS set to UTF-8, and Chinese/Japanese characters show fine, so I don't yet know how it could it be that this URI consists of windows-1252 encoded characters. If I could fix the URI to contain UTF-8 encoding, that would be a better way of fixing this.