views:

160

answers:

3

For some reason my buffer is getting filled with jibberish, and I'm not sure why. I even checked my file with a hex editor to verify that my characters are saved in a 2 byte unicode format. I'm not sure what's wrong.

[on file open]

fseek(_file_pointer, 0, SEEK_END);
this->_length = ftell(this->_file_pointer) / sizeof(chr);

[Main]

//there is a reason for this, I just 
//didn't include the code that tells why
typedef wchar_t chr;
chr *buffer = (chr*)malloc(f->_length*sizeof(chr));
if(buffer == NULL)return;
memset(buffer,0,f->_length*sizeof(chr));
f->Read_Whole_File(buffer);
f->Close();
free(buffer);

[Read_Whole_File]

void Read_Whole_File(chr *buffer)
{
    if(buffer == NULL)
    {
     this->_IsError = true;
     return;
    }
    fseek(this->_file_pointer, 0, SEEK_SET);
    int a = sizeof(buffer[0]);//for debugging purposes 
    fread(buffer, a, _length, this->_file_pointer); 
}
+1  A: 

The signature of fread is:

size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);

Where size is the size of each element, and nmemb is the number of elements. In your case, size is sizeof(chr) and nmemb is the length of the buffer in characters.

Conrad Meyer
A: 

If you are in C++, why not use a std::fstream?

Apart from that, you use unicode, note that c and c++ are seriously lacking in their standard unicode support. The answers here might help you read these unicode files.

But I must stress again, if you are using c++, use the STL. Also, check the excellent answer to this question: std::wstring VS std::string.

jilles de wit
not what I asked.
kelton52
This better? By the way, perhaps you can add the actual question you asked in the duplicate of this question to this question.
jilles de wit
The first link was great, because it informs me that the wchar_t size is determined at compile. I still don't see the point in telling me to use a different function for reading...I do have a specific reason for using fread, and as I said before it has nothing to do with my question. I don't mean to be ungrateful, but I noticed people on here tend to write a lot of junk unrelated to the question, or criticizing the methods...and when they do this they fill up the post with unnecessary garbage that fools people into thinking that the question has been resolved, when in fact it has not.
kelton52
this is a very frusterating thing when you are working on time sensitive material, and your question gets overlooked. I appreciate the usefull input though, very much so.
kelton52
Well if you need an answer now perhaps a site where people answer questions for free at their leisure is not the place to ask it?About my suggestion to use std::fstream, maybe it was a bit verbose, but my suggestion was sincere. If you are going to use c++, try to use the good aspects of c++. And in the process of converting, you might find that your unicode problems are also easier resolved there.
jilles de wit
I wasn't trying to be rude, and in fact I don't think I was. This site has been great for me, most of the time when I ask a question it's only a matter of minutes before someone responds, it's just weeding through the garbage is a hassle, and your right also, I should expect it from a free site. I do appreciate the fact that you've taken the time to try and help me resolve my issue.
kelton52
+1  A: 

Assuming your error handling (that you said you've omitted here) is sound, I see two reasons that may be the cause of the problem:

  1. First of all, wchar_t may not necessarily be 2 bytes, its size is implementation defined. For example on Linux it's most likely 4 bytes.

  2. It may be that the file is UTF-16BE (big-endian), and you are running on a little-endian platform, so the wchar_t values in your buffer have their byte order swapped.

Or, it may be both. Please update your question with some details about your platform and a few bytes from the sample file in hex (if possible).

In any case, you should not make any assumptions about sizes of standard C or C++ types when dealing with Unicode files.

For example, If you want to read UTF16-BE, use C99 uint16_t type (or an equivalent type that's guaranteed to be 16-bit), and swap byte order of your input depending on your platform endian-ness and file endian-ness. You can detect file endian-ness using a byte order mark if it's present in the file.

Alternatively, use a third-part Unicode library, like ICU. It takes care of all platform-specific details and will save you a lot of time debugging in a sizable project.

Alex B
good point, I'll check
kelton52
BTW I have macros that's how chr because wchar_t, but as it says above, that's ommited.
kelton52
It turns out the file was in Big-endian format, but I am curious, how would one go about determining the file format without user input?
kelton52
http://en.wikipedia.org/wiki/Byte_order_mark (I have updated the answer for completeness).
Alex B