views:

409

answers:

7

I have a text file that display differently when opening it in FreeBSD vs. Windows.

On FreeBSD: An·lisis e InvestigaciÛn

On Windows: Análisis e Investigación

The windows representation is obviously right. Any ideas on how to get that result in bsd?

A: 

How is the file encoded? I would try re-encoding the file as UTF-16.

Andrew Hare
+1  A: 

This is not pure ASCII. It's utf-8. Try freebsd editor with utf-8 support or change locales.

Jacek Ławrynowicz
+4  A: 

The problem is that it's not ASCII, but UTF-8. You have to use another editor which detects the encoding correctly or convert it to something your editor on freebsb understands.

Georg
"it it's probably ISO-8859-1". 'á' is displayed as '·', so it must be multibyte encoding (utf-8 or utf-16).
Jacek Ławrynowicz
It's definitely UTF-8. An easy way to tell is that those funny-looking accented A's will show up when encoding characters just outside the first 128 in Unicode.
Ant P.
Right, sorry. Didn't see that.
Georg
+1  A: 

From the way the characters are being displayed, I would say that file is UTF-8 encoded unicode. Windows is recognising this, and displaying the 'á' and 'ó' characters correctly, while FreeBSD is assuming it's ISO-8859-1, which results in these characters being displayed as 2 seperate characters (due to the UTF-8 encoding using 2 bytes). You'll have to tell FreeBSD that it is a UTF-8 file, somehow.

Simon Callan
A: 

So after doing a bit more digging if 1) Open the csv file in excel on mac and export it as csv file and 2) then open it in textmate, copy the text, and save it again it works.

The result of: file file.csv is

UTF-8 Unicode English text, with very long lines

The original is:

on-ISO extended-ASCII English text, with very long lines

This workaround isn't really suitable as this process is supposed to be automated, thanks for the help so far.

A: 

It doesn't matter which operating system you're using when you open the file. What matters is the application you use to open it. On Windows you're probably using Notepad, which automatically identifies the encoding as UTF-8.

The app you're using on FreeBSD obviously isn't doing that. Maybe it just can't read UTF-8 and you need to use a different app. Or maybe you just have to tell it which encoding to use. Automatic detection of character encodings is far from universal (and much farther from perfect).

Alan Moore