views:

36

answers:

3

Hi

I need some functionality to make the following string in a url-friendly format: "knæ som gør" should be "kna-som-gor"

That is, replacing culture specific characters to characters that can be used in urls.

Using .Net and C#

Please help me :)

/Andreas

A: 

Don't complicate things. :)

Either use a regexp, or simply use String.Replace.

Marcus L
That is a bad option since there are numerous of characters that needs to be fixed. It will take lots of time to fix all characters in that way :P
Andreas
Then I still suggest you look into regular expressions (regexp). I'm sure you can easily find examples or already finished expressions that you can use.
Marcus L
A: 

You can find a solution that removes diacritics here: http://stackoverflow.com/questions/249087. This solution does not help you with æ or ø, though.

Maybe that removes enough of your special characters that the rest can be translated using simple replacing?

If "url-friendly" does not mean pretty, you could also use HttpUtility.UrlEncode, which produces "kn%c3%a6+som+g%c3%b8r".

Jens
A: 

Edit: Added possible solution (end of post).

I had a very similar problem, albeit for file names rather than URLs. The main problem seems to be that there is no standard way to ask for the "best ASCII replacement for ø", so even if you can locate all the unwanted characters it is hard to automate which replacement to insert.

I posted quite a bit of code that might be helpful. See this StackOverflow question for details.

Edit: I think the solution to this problem lies with StringInfo, which allows you to iterate through the sub-characters (Unicode surrogates or combining characters) in a string. This should make it possible to detect and convert something like å (which can be encoded in Unicode as either A-WITH-RING or RINGED-A; filter out the decorator and keep the part that is a normal character).

Morten Mertner