ansaurus

Question

Answer 1

+5 A:

bison yes, flex no. The one time I needed a bison parser to work with UTF-8 encoded files I ended up writing my own yylex function.

edit: To help, I used a lot of the Unicode operations available in glib (there's a gunicode type and some file/string manipulation functions that I found useful).

eduffy 2009-06-01 14:50:01

Well, my lexer handles the UTF-8 chars just fine, but the Bison parser stops parsing as soon as it sees a negative value. Please advise.

Martin Cote 2009-06-01 14:52:16

Are you reading your file 1 byte at a time? or 1 utf-8 encoded character at a time?

eduffy 2009-06-01 14:53:41

1 byte at a time.

Martin Cote 2009-06-01 14:59:29

Then that's the problem. The bit that signifies a 'char' is negative in ASCII is the same bit that tells a UTF-8 char that it is more than 1 byte in length (IIRC). You need to use something other than fgetc.

eduffy 2009-06-01 15:15:43

Answer 2

+2 A:

flex being the issue here, you might want to take a look at zlex.

chaos 2009-06-01 15:00:49

That's an interesting project, but wouldn't exactly solve the problem addressed in this question. 16-bit characters are different from UTF-8 encoded characters (for one thing UTF-8 can be up to 4 bytes in length).

eduffy 2009-06-01 15:21:50

ansaurus

tags:

views:

answers:

Can Bison parse UTF-8 characters?

related questions