views:

9

answers:

1

I have a binary data file, in a format used by a relatively ancient program, which I am trying to convert into something sane. With the help of a Hex editor I have basically worked out the file format except that it contains Hebrew characters with an odd encoding.

All characters are 8 bits. The "standard" 27 consonants (including "final" consonants) go from hex 80 to 9A. Then there are vowels that seem to start around hex 9B or so (I'm guessing right after the standard consonants end). Then there are "dotted" consonants that seem to start at hex E0.

If I remember correctly, I think this is some sort of DOS encoding. What encoding is this and what encoding should I translate it to so that a user in Israel will be able to most easily open it in, say, Microsoft Word? Are there any tools that I could use to do the translation?

+1  A: 

80 to 9A seem to match the codepoints in the CP862, but I could not find any match for the vowel codepoints. I think what you should do is just make a custom mapping to Unicode and produce the output in UTF-8 or UTF-16LE plain text file. If you add a BOM (Byte-Order-Mark), Notepad and/or Word should be able to read it without issues. I would probably make a small Python script, but it shouldn't be hard in any other language.

Igor Skochinsky