views:

87

answers:

3

I'm trying to write a simple C++ program to open a torrent file (Passed through argv[1]), read all of it, and then print the entire file's contents verbatim with no alterations, it has to print a carbon copy of the original torrent. The issue is, some of the torrents may contain Japanese, Russian, etc. (FIlenames, description, etc.)... And of course the standard torrent data with the hashes and whatnot.

What's the best way to go about doing this? What I have so far only outputs a portion of the contents, and it doesn't seem to read or print the data correctly... It's garbled or something:

#include "stdafx.h" 
#include <iostream>
#include <fstream>
#include <string>

using namespace std;

#if defined(UNICODE)
    #define _tcout wcout
#else
    #define _tcout cout
#endif

int _tmain(int argc, TCHAR* argv[])
{
    wifstream File(argv[1]);
    wstring Line;

    while(!File.eof() )
    {
        getline(File, Line);

        _tcout << Line << endl;
    }
    File.close();
    return 0;
}
+1  A: 

You have a classic basic mistake with reading a file:

while(!File.eof() )
{
    getline(File, Line); // If this line fails (ie you reach EOF)
                         // Then you still print out the Line (whoes value is undefined)

    _tcout << Line << endl;
}

Remember that the second to last getline will read upto (but not including) the EOF. Ther last call will read just the EOF. This mistake usually means that the above loop will print the last line twice (depending on how Line is set).

The real solution is to put the getline into the while.

while(getline(File, Line))
{
    _tcout << Line << endl;
}

If the getline() reads an EOF then it sets a flag in File. The result of getline() is a reference to the stream object (File), when this is used in a boolean context it is converted into a value that is convertable to true if everything is OK or false if somthing went wrong (like EOF). This will mean the loop body is not entered when you reach the EOF.

The other thing you need to watch is the properties of the terminal and the format the file is in. If there is a mismatach then it will look like the characters displayed do not match..

Martin York
Still doesn't work, and it looks like you're right - the characters don't match up (Comparing the output vs what notepad and editpad pro show), but I'm not quite sure what you mean by checking the properties of the terminal.
Jon
+1  A: 

By using wifstream you are treating the file as UTF-16, which is wrong. The torrent specification clearly says it works with strings of bytes, not Unicode characters. I get the impression that BT doesn't care about character set (code page), either, leaving that up to the interpretation of the client program. Filenames are just strings of bytes, with no meaning attached.

A torrent file isn't a text file, since it contains binary hash values, so trying to read and write it as a text file isn't a good idea. It would be better to implement a bencoding parser so that you can convert the hash values to hex before outputting them.

Neil Mayhew
To quote the BitTorrent spec *"All strings in a .torrent file that contains text must be UTF-8 encoded."*. Though some *"strings"* don't contain text and are really just byte arrays.
Alexandre Jasmin
Also, `wifstream` being UTF-16 is an implementation detail of Windows - it is not guaranteed to be such by the language. (The language leaves it up to the implementation...)
Thanatos
A: 

As Neil Mayhew mentioned in his answer treating the whole .torrent file as text doesn't make much sense as it contains binary data.

You should reconsider the following points:

  • Don't use wide char streams as the file size may not be a multiple of sizeof(wchar_t).
  • read() is preferable to getline() in this case as .torrent files don't use a line based text format.
  • Use the ios::binary flag when opening the file or you'll get unwanted end-of-line conversion (this happens on Windows)
  • You should also switch cout to binary mode for the same reason.
Alexandre Jasmin