views:

397

answers:

5

Hi,

I am trying to write a simple UTF-8 decoder for my assignment. I'm fairly new with C++ so bear with me here...

I have to determine whether the encoding is valid or not, and output the value of the UTF-8 character in hexadecimal in either case. Say that I have read the first byte and used this first byte to determine the number of bytes in this UTF8 character. The problem is that after I read the first byte, i'm having trouble setting the ifstream position back one byte and read the whole UTF-8 character. I've tried seekg() and putback(), but i always get BUS error or some weird output that's not my test data. Please help, thanks.

Even though i can use peek() for the first byte, but i still have to read the following bytes to determine whether the encoding is valid or not. The problem of setting back the stream position is still there.

+2  A: 

I would suggest you use peek() to read the first byte instead. seekg() should work to rewind, but a BUS error is usually caused by your code breaking alignment issues, which points to you doing something else evil in your code.

McBeth
+1  A: 

Why do you have to seek back? Can't you simply read the rest of the UTF-8 sequence, after knowing how many more octets you're expecting?

Ates Goral
i have to output the whole character in hex value.
cplusplusNewbie
OK, so you already got the first byte. Read the rest, and output all. I don't understand why you need to seek back.
Ates Goral
+1  A: 

I would read the next byte directly and add it to what I got. As Ates Goral said. It is cleaner IMHO.

Anyway, You could move the stream pointer using seekg():

char byte = 0;
unsigned  int character = 0; // on every usage
ifstream file("test.txt", ios::binary);

file.get(byte);
......
file.seekg(-1, ios::cur); // cur == current position
file.get(
    reinterpret_cast<char*>(&character),
    numberOfBytesAndNullTerminator);

cout << hex << character;

Beware that get() in the second case writes '\0' at the end of character. So you have to give it the required number of bytes including the null terminator. So, if you want to read two bytes ==> numberOfBytesAndNullTerminator = 3.

AraK
A: 

I don't know why you need to put the character back but istream::unget() or istream::putback() should do what you want. Look them up in your compiler's documentation.

jmucchiello
A: 

please look up :

ifstream::seekg()
ifstream::teellg()
Maciek