views:

270

answers:

3

I'm trying to find out if two strings I have are the same, for the purpose of unit testing. The first is a predefined string, hard-coded into the program. The second is a read in from a text file with an ifstream using std::getline(), and then taken as a substring. Both values are stored as C++ strings.

When I output both of the strings to the console using cout for testing, they both appear to be identical:

ThisIsATestStringOutputtedToAFile ThisIsATestStringOutputtedToAFile

However, the string.compare returns stating they are not equal. When outputting to a text file, the two strings appear as follows:

ThisIsATestStringOutputtedToAFile T^@h^@i^@s^@I^@s^@A^@T^@e^@s^@t^@S^@t^@r^@i^@n^@g^@O^@u^@t^@p^@u^@t^@ t^@e^@d^@T^@o^@A^@F^@i^@l^@e

I'm guessing this is some kind of encoding problem, and if I was in my native language (good old C#), I wouldn't have too many problems. As it is I'm with C/C++ and Vi, and frankly don't really know where to go from here! I've tried looking at maybe converting to/from ansi/unicode, and also removing the odd characters, but I'm not even sure if they really exist or not..

Thanks in advance for any suggestions.

EDIT Apologies, this is my first time posting here. The code below is how I'm going through the process:

ifstream myInput;
ofstream myOutput;

myInput.open(fileLocation.c_str()); 
myOutput.open("test.txt");

TEST_ASSERT(myInput.is_open() == 1);

string compare1 = "ThisIsATestStringOutputtedToAFile";
string fileBuffer;

std::getline(myInput, fileBuffer);
string compare2 = fileBuffer.substr(400,100);

cout << compare1 + "\n";
cout << compare2 + "\n";
myOutput << compare1 + "\n";
myOutput << compare2 + "\n";
cin.get();

myInput.close();
myOutput.close();

TEST_ASSERT(compare1.compare(compare2) == 0);
A: 

The following works for me and writes the text pasted below into the file. Note the '\0' character embedded into the string.

#include <iostream>
#include <fstream>
#include <sstream>

int main()
{
    std::istringstream myInput("0123456789ThisIsATestStringOutputtedToAFile\x0 12ou 9 21 3r8f8 reohb jfbhv jshdbv coerbgf vibdfjchbv jdfhbv jdfhbvg jhbdfejh vbfjdsb vjdfvb jfvfdhjs jfhbsd jkefhsv gjhvbdfsjh jdsfhb vjhdfbs vjhdsfg kbhjsadlj bckslASB VBAK VKLFB VLHBFDSL VHBDFSLHVGFDJSHBVG LFS1BDV LH1BJDFLV HBDSH VBLDFSHB VGLDFKHB KAPBLKFBSV LFHBV YBlkjb dflkvb sfvbsljbv sldb fvlfs1hbd vljkh1ykcvb skdfbv nkldsbf vsgdb lkjhbsgd lkdcfb vlkbsdc xlkvbxkclbklxcbv");
    std::ofstream myOutput("test.txt");
    //std::ostringstream myOutput;

    std::string str1 = "ThisIsATestStringOutputtedToAFile";
    std::string fileBuffer;

    std::getline(myInput, fileBuffer);
    std::string str2 = fileBuffer.substr(10,100);

    std::cout << str1 + "\n";
    std::cout << str2 + "\n";
    myOutput << str1 + "\n";
    myOutput << str2 + "\n";

    std::cout << str1.compare(str2) << '\n';

    //std::cout << myOutput.str() << '\n';
    return 0;
}

Output:

ThisIsATestStringOutputtedToAFile
ThisIsATestStringOutputtedToAFile
sbi
+1  A: 

How did you create the content of myInput? I would guess that this file is created in two-byte encoding. You can use hex-dump to verify this theory, or use a different editor to create this file.

The simpliest way would be to launch cmd.exe and type

echo "ThisIsATestStringOutputtedToAFile" > test.txt

UPDATE:

If you cannot change the encoding of the myInput file, you can try to use wide-chars in your program. I.e. use wstring instead of string, wifstream instead of ifstream, wofstream, wcout, etc.

Miroslav Bajtoš
The contents of myInput are a custom file extension XML file, although opening them with vi shows it to be recognized as a binary file. Reading and printing the file line by line to the console displays it fine, so I'm guessing I need to convert it from a binary stream to a ACSII type stream?
Smallgods
When you print the file to console, characters with ASCII code below 32 are treated as control codes (e.g. TAB, CR, LF, etc.). The character ^@ (ASCII 0x00) doesn't do anything, it is just skipped.Vi recognizes the file as binary because of these ^@ characters.
Miroslav Bajtoš
The myInput file was in binary format it seems. Some more searching led to this article which has set me well on my way. Cheers for the help! http://stackoverflow.com/questions/181634/simplest-efficient-ways-to-read-binary-and-ascii-files-to-string-or-similar-in-v
Smallgods
A: 

It turns out that the problem was that the file encoding of myInput was UTF-16, whereas the comparison string was UTF-8. The way to convert them with the OS limitations I had for this project (Linux, C/C++ code), was to use the iconv() functions. To keep the compatibility of the C++ strings I'd been using, I ended up saving the string to a new text file, then running iconv through the system() command.

system("iconv -f UTF-16 -t UTF-8 subStr.txt -o convertedSubStr.txt");

Reading the outputted string back in then gave me the string in the format I needed for the comparison to work properly.

NOTE I'm aware that this is not the most efficient way to do this. I've I'd had the luxury of a Windows environment and the windows.h libraries, things would have been a lot easier. In this case though, the code was in some rarely used unit tests, and as such didn't need to be highly optimized, hence the creation, destruction and I/O operations of some text files wasn't an issue.

Smallgods