tags:

views:

1197

answers:

4

I have a string looks like aeroport aimé

I know it is French, and I want to convert this string back to readable format. Any suggestions?

A: 

Heh. It's simple cryptanalysis task. You should collect statistics of letter usage in your string. It can be by single letter, two- or better tree-letter groups. Than you should collect the same statistics on big amount of text of same thematic. Then you should arrange tree-gramms of Franch and your fancy text by usage and decode your cryptogram. Of course it'll be wrong at first, but than you can apply dictionary to determine failure ratio and apply some kind of genetics algorithm to find best mach.

And by the way. If originally text was UTF-8, but was 'forced' to be an one byte code page text, you should operate in bytes - not in symbols.

Artem Tikhomirov
A: 

Seems no simple way to do that. anyway , thanks.

Gracepig
It's standard task, as I said. There should be some third party library. By the way this is called Substitution Cipher [http://en.wikipedia.org/wiki/Substitution_cipher]
Artem Tikhomirov
+5  A: 

That is not French, the French word for "airport" is "aéroport".

If you want to convert the string to a readable format, you have to know what encoding the original string was in, not what language. "aeroport aimé" is a legal UTF8 string.

Where are you seeing this string? On a Windows command prompt? That shows funny characters like "├⌐" for high-ASCII characters. The command prompt uses CP437, not UTF8, if you have the UTF8 string "aimé" it will display as "aim├⌐" in CP437.

If that is your situation, try writing the string to a file and opening the file in Notepad. If that looks right your string is correct, the application displaying it is wrong.

Dour High Arch
A: 

This helped me in a similar case: string ok_string = System.Text.Encoding.UTF8.GetString( System.Text.Encoding.Default.GetBytes(bad_string));

BreffaH