views:

142

answers:

1

I have a text file containing what I am told are unicode characters, for example:

\320\222\320\21015-25'ish per main or \320\222\320\21020-40'ish per starter

Which should read:

£15-25'ish per main or £20-40'ish per main starter

However, when viewing this text in Firefox, the output is mangled with various unwanted characters.

So, are these really unicode characters? And if so, how can I convert them to a form which is displayable correctly?

+3  A: 

You need to:

  • know the encoding of the text file
  • read the data without losing information (either by reading it as binary or by reading it as text with the right encoding)
  • write the data with the right encoding (either by writing it out in binary and specifying the original encoding, or writing it out as text in an encoding which you also specify in the headers)

Try to separate out the problem into "reading" and/or "writing". Do you know the encoding of the file? What do you have to do with the file? When you've written it with backslashes, is that actually what's in the file (i.e. an escaped form) or is it actually just a "normal" text encoding such as UTF-8?

Jon Skeet