views:

56

answers:

3

---- PLEASE CLOSE ----

------ Edit ---------

I found where the problem is. I'm going to start a new question for the real problem ....

----------------------

 


Hi,

My Situation:

Linux (Ubuntu 10.04)
gcc

But it has to be platform independent

I have a text file (UTF-8) with special characters like ¥ © ® Ỳ È Ð. I have a std::map where I would like to have a datatype for the key to hold these special characters. Currently I'm using wchar_t.

Then I have to use strings, which can contain these chars. Now I'm using std::wstring.

I have to read that UTF-8 file. So, I thought using a wifstream. And for line processing, I used wstringstream.

I think this isn't bad, what I've done so far... If not, what is better?

What is going wrong:

Of course, I have to read that file. But reading the lines stops at the first line with a special char. In short this is what I did:

map<wchar_t, Glyph*> glyphs;

//...

wifstream in(txtFile.c_str());
 if (!in.is_open())
 {
  throw runtime_error("Cannot open font text file!!");
 }
 wstring line;
 while (getline(in, line)) // edit
 {
  printf("Loading glyph\n");
  if (line.length() == 0)
  {
   continue;
  }
  wchar_t keyChar = line.at(0);
  /* First, put the four floats into the wstringstream */
  wstringstream ss(line.substr(2));
  /* Now, read them out */
  Glyph *g = new Glyph();
  ss >> g->x;
  ss >> g->y;
  ss >> g->w;
  ss >> g->h;
  glyphs[keyChar] = g;
  printf("Glyph `%c` (%d): %f, %f, %f, %f\n", keyChar, keyChar, g->x, g->y, g->w, g->h);

 }

So, the question is: How to read a file with the special chars with a wifstream?

Thanks in advance!

How the file looks:

  0.000000 0.000000 0.010909 0.200000
A 0.023636 0.000000 0.014545 0.200000
B 0.050909 0.000000 0.014545 0.200000
C 0.078182 0.000000 0.014545 0.200000
D 0.105455 0.000000 0.014545 0.200000
E 0.132727 0.000000 0.014545 0.200000

....

È 0.661818 0.400000 0.014545 0.200000
É 0.689091 0.400000 0.014545 0.200000
Ê 0.716364 0.400000 0.014545 0.200000
Ë 0.743636 0.400000 0.014545 0.200000
Ì 0.770909 0.400000 0.012727 0.200000
Í 0.796364 0.400000 0.012727 0.200000
Î 0.821818 0.400000 0.012727 0.200000
Ï 0.847273 0.400000 0.012727 0.200000
Ð 0.872727 0.400000 0.014545 0.200000
Ñ 0.900000 0.400000 0.014545 0.200000
A: 

Unfortunately C++ is a bit lacking here - the w in wifstream refers to the types in use, rather than the ability to handle files with wide characters. You'll have to do some coding on your own, but you can find recipes at:

  1. Reading UTF-8 with C++ streams
  2. Upgrading an STL-based application to use Unicode
On Freund
+1  A: 
  1. use while( !in ) instead of the eof variant, it's better, see this question

  2. I'm assuming you're using Windows (as Linux and Mac normally have native UTF-8 platform encoding, which allows you to ignore most of this stuff).

What I would do is read the whole file as chars and convert it to wchar_t's using the handy functions in this question by me :).

Remember: on linux (and probably mac os x too) you can just output a UTF-8 stream to a terminal and get the right characters, in Windows, that's a whole different kond of story.

rubenvb
@rubenvb: No, Linux.
Martijn Courteaux
@Martijn: you should process using `std::string` and get UTF-8 string-char's. `wchar_t` isn't handy, nor cross-platform. You can use the space as a delimiter, and store the UTF-8 character (1-4 bytes wide) in a `std::string` or if you really want a `char*`. `std::wifstream` is for reading `wchar_t`s (which kind of translates to UTF-32 on linux, and UTF-16 on Windows), not UTF-8
rubenvb
A: 

If you do not have to use the STL containers, I would suggest using the Qt framework. Qt uses unicode by standard. Also, the classes are very well designed and feel really good to use.

You can create a QTextStream which will do the things that you want.

http://doc.trolltech.com/latest/qtextstream.htm

I think the following code should do.

QHash<QChar, Glyph*> glyph_map;

QFile data("input.txt");
if (!data.open(QFile::ReadOnly) {
  // handle error and return...
}

QTextStream in(&data);

while (!in.atEnd()) {
  QChar c;
  in >> c;
  // do stuff with your Glyph
  glyph_map[c] = glyph;
}
Haplo