UCS-2LE text file parsing

I have a text file that was created using some Microsoft reporting tool. The text file includes the BOM 0xFFFE in the beginning and then ASCII character output with nulls between characters (i.e "F.i.e.l.d.1."). I can use iconv to convert this to UTF-8 using UCS-2LE for input format and UTF-8 for output format... it works great.

My problem is that I want to read in lines from the UCS-2LE file into strings and parse out the field values and then write them out to a ASCII text file (i..e Field1 Field2). I have tried string and wstring versions of getline and while it reads the string from the file, functions like substr(start, length) do interpret the string as 8-bit values so the start and length values are off.

How do I read the UCS-2LE data into a C++ string and extract the data values? I have looked at boost and icu as well as numerous google searches but have not found anything that works. What am I missing here?? Please help!

My example code looks like this:

wifstream srcFile;
srcFile.open(argv[1], ios_base::in | ios_base::binary);
..
..
wstring  srcBuf;
..
..
while( getline(srcFile, srcBuf) )
{
    wstring field1;
    field1 = srcBuf.substr(12,12);
    ...
    ...
}

so, if for example, srcBuf contains "W.e. t.h.i.n.k. i.n. g.e.n.e.r.a.l.i.t.i.e.s.," then the substr call above returns ".k. i.n. g.e" instead of "g.e.n.e.r.a.l.i.t.i.e.s.". What I want is to read in the string and process it without having to worry out the multi-byte representation. Does anybody have an example of using Boost (or something else) to read these strings from the file and convert to a fixed width representation for internal use? BTW, I am on a mac using eclipse and gcc...is it possible my stl does not understand wide character strings?

Thanks

ansaurus

tags:

views:

answers:

UCS-2LE text file parsing

related questions