Unicode / Non-Unicode / UTF-8 Problems

views:

308

answers:

+2 Q:

Unicode / Non-Unicode / UTF-8 Problems

An application I am working on stores data in an INI file. The application creates the INI file which in turn will be read by another application we also created. The INI file may also be hand edited.

It is likely sooner or later that the INI file will contain different languages so we were careful to ensure that all data used in this file was in unicode format.

After creating the INI file initially We examined the file in notepad and noticed that the letter spacing was screwed up. After a bit of of research we discovered the Unicode Byte Order Mark (BOM) FF FE & started writing this at the start of the file and all seemed well - The File was created correctly and could be hand edited in notepad.

Now the problem - We went looking for an INI file parser instead of creating our own. Boost property Tree seemed ideal but it seems the BOM is not filtered out out by the underlying wifstream and eventually property tree throws an exception because of this.

Next we tried SimpleINI link text but simpleINI (CSimpleIniW) does not seem to work unless the UTF-8 marker is at the start of the file.

So far 2 seemingly well developed INI file processors will not work with our simple INI File so we started thinking we are taking the wrong approach. Apart from the obvious "Should have used XML" What real world advice can you offer on this problem?

UPDATE:

I have this working now. The BOM wasn't the problem. It was because the data was not stored in UTF8. Thanks....

+1 A:

Use a text-editor that removes the BOM, such as Notepad++.
There's no problem in removing the BOM, and this is a common solution in Web Development.

Dor 2009-12-13 21:00:06

We have not control over which text editor end users use for the INI File.

Canacourse 2009-12-13 21:03:24

Then use a script that removes them, if possible.Advice the end-user which text editor he should use.

Dor 2009-12-13 21:08:12

+1 A:

Is there any reason you're not using the native Windows API's for reading and writing the profiles? Using the native APIs should ensure that the data will get picked up consistently by both applications since they'd be using the same exact APIs.

Rick Strahl 2009-12-13 21:00:29

Yes but We have not control over which text editor end users use for the INI File.

Canacourse 2009-12-13 21:04:16

+2 A:

If you intend to use Unicode in INI file, BOM is required. Without BOM, the reader doesn't know which encoding it's in. It could be in UTF-16 (big/small endian) or UTF-8. This is a big drawback of INI file. XML has a visible preamble that you can specify encoding and it's much easier to deal with.

We use GetPrivateProfileStringW to read INI files in UTF-8 and haven't found any issues as long as BOM is there.

If this is a Windows app, you really should switch to registry. Otherwise, XML is the way to go.

ZZ Coder 2009-12-13 21:35:32

Thanks. Cant user the registry. The Ini file is a Cfg file that is created on an administrators pc and processed on end users Pc's.

Canacourse 2009-12-13 21:51:02

There are various ways to deploy registry changes. You can do it in an application's installation package. Or in an Intranet, IT can deploy registry changes to PCs.

Craig McQueen 2009-12-13 23:34:01

ansaurus

tags:

views:

answers:

Unicode / Non-Unicode / UTF-8 Problems

related questions