views:

383

answers:

1

I have a SQL dump file consisting of incorrectly stored Cyrillic Russian ( WINDOWS-1251 ) text, example Èðàíñêèå which should properly be displayed as Иранские.

In the past I have successfully converted the sql file but memory fails in what I did and in what order.

Logically it would make sense that since it's stored in LATIN1 I would convert from LATIN1 to WINDOWS-1251 and then from WINDOWS-1251 to UTF-8//TRANSLIT or something like that.

So far I've tried:

1.

iconv -f WINDOWS-1251 -t UTF-8//TRANSLIT -o new.sql snippet.sql

Output:

Èðàíñêèå ( Not what I want )

2.

iconv -f LATIN1 -t UTF-8//TRANSLIT -o new.sql snippet.sql 

Output:

Ã<88>ðàíñêèå ( Not what I want either )

Notes

  • It's possible that I might have converted once and then twice to get my desired result, but I'm pretty sure that on the last step I converted from WINDOWS-1251 to UTF-8//TRANSLIT as that was written down in my notes.

  • One other note is that I'm viewing Èðàíñêèå in the SQL file when the file encoding is utf8 ( native in vim ). If I do set enc=latin1 in vim then I see ~Hð| íñêèå as if that doesn't make it more confusing.

+1  A: 
iconv -f utf-8 -t latin1 < in.sql | iconv -f cp1251 -t utf-8 > out.sql
Ignacio Vazquez-Abrams
Awesome - thank you. Although I did have to replace around ~40-50 or so UTF8 characters with a temporary string indicating the Unicode hexpoint since iconv could not process without those, it did work out.
meder
Actually it seems I forgot to specify `//TRANSLIT` in the initial iconv, should've done that instead.
meder