views:

15415

answers:

5

I'm trying to use the BEncoding ObjC class to decode a .torrent file.

NSData *rawdata = [NSData dataWithContentsOfFile:@"/path/to/the.torrent"];
NSData *torrent = [BEncoding objectFromEncodedData:rawdata];

When I NSLog torrent I get the following:

{
    announce = <68747470 3a2f2f74 6f727265 6e742e75 62756e74 752e636f 6d3a3639 36392f61 6e6e6f75 6e6365>;
    comment = <5562756e 74752043 44207265 6c656173 65732e75 62756e74 752e636f 6d>;
    "creation date" = 1225365524;
    info =     {
        length = 732766208;
        name = <7562756e 74752d38 2e31302d 6465736b 746f702d 69333836 2e69736f>;
        "piece length" = 524288;
....

How do I convert the name into a NSString? I have tried..

NSData *info = [torrent valueForKey:@"info"];
NSData *name = [info valueForKey:@"name"];
unsigned char aBuffer[[name length]];
[name getBytes:aBuffer length:[name length]];
NSLog(@"File name: %s", aBuffer);

..which retrives the data, but seems to have additional unicode rubbish after it:

File name: ubuntu-8.10-desktop-i386.iso)

I have also tried (from here)..

NSString *secondtry = [NSString stringWithCharacters:[name bytes] length:[name length] / sizeof(unichar)];

..but this seems to return random Chinese characters:

扵湵畴㠭ㄮⴰ敤歳潴⵰㍩㘸椮潳

The fact the first way (as mentioned in the Apple documentation) returns most of the data correctly, with some additional bytes makes me think it might be an error in the BEncoding library.. but my lack of knowledge about ObjC is more likely to be at fault..

+2  A: 

Aha, the NSString method stringWithCString works correctly:

With the bencoding.h/.m files added to your project, the complete .m file:

#import <Foundation/Foundation.h>
#import "BEncoding.h"

int main (int argc, const char * argv[]) {
    NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];

    // Read raw file, and de-bencode
    NSData *rawdata = [NSData dataWithContentsOfFile:@"/path/to/a.torrent"];
    NSData *torrent = [BEncoding objectFromEncodedData:rawdata];

    // Get the file name
    NSData *infoData = [torrent valueForKey:@"info"];
    NSData *nameData = [infoData valueForKey:@"name"];
    NSString *filename = [NSString stringWithCString:[nameData bytes] encoding:NSUTF8StringEncoding];
    NSLog(@"%@", filename);

    [pool drain];
    return 0;
}

..and the output:

ubuntu-8.10-desktop-i386.iso
dbr
+2  A: 

In cases where I don't have control over the data being transformed into a string, such as reading from the network, I prefer to use NSString -initWithBytes:length:encoding: so that I'm not dependent upon having a NULL terminated string in order to get defined results. Note that Apple's documentation says if cString is not a NULL terminated string, that the results are undefined.

Jay O'Conor
+5  A: 
NSData *torrent = [BEncoding objectFromEncodedData:rawdata];

When I NSLog torrent I get the following:

{
    ⋮
}

That would be an NSDictionary, then, not an NSData.

unsigned char aBuffer[[name length]];
[name getBytes:aBuffer length:[name length]];
NSLog(@"File name: %s", aBuffer);

..which retrives the data, but seems to have additional unicode rubbish after it:

File name: ubuntu-8.10-desktop-i386.iso)

No, it retrieved the filename just fine; you simply printed it incorrectly. %s takes a C string, which is null-terminated; the bytes of a data object are not null-terminated (they are just bytes, not necessarily characters in any encoding, and 0—which is null as a character—is a perfectly valid byte). You would have to allocate one more character, and set the last one in the array to 0:

size_t length = [name length];
unsigned char aBuffer[length];
[name getBytes:aBuffer length:length];
aBuffer[length - 1] = 0;
NSLog(@"File name: %s", aBuffer);

But null-terminating the data in an NSData object is wrong (except when you really do need a C string). I'll get to the right way in a moment.

I have also tried […]..

NSString *secondtry = [NSString stringWithCharacters:[name bytes] length:[name length] / sizeof(unichar)];

..but this seems to return random Chinese characters:

扵湵畴㠭ㄮⴰ敤歳潴⵰㍩㘸椮潳

That's because your bytes are UTF-8, which encodes one character in (usually) one byte.

unichar is, and stringWithCharacters:length: accepts, UTF-16. In that encoding, one character is (usually) two bytes. (Hence the division by sizeof(unichar): it divides the number of bytes by 2 to get the number of characters.)

So you said “here's some UTF-16 data”, and it went and made characters from every two bytes; each pair of bytes was supposed to be two characters, not one, so you got garbage (which turned out to be mostly CJK ideographs).


You answered your own question pretty well, except that stringWithUTF8String: is simpler than stringWithCString:encoding: for UTF-8-encoded strings.

However, when you have the length (as you do when you have an NSData), it is even easier—and more proper—to use stringWithBytes:length:encoding:. It's easier because it does not require null-terminated data; it simply uses the length you already have.

Peter Hosey
+19  A: 

That's an important point that should be re-emphasized I think. It turns out that,

NSString *content = [NSString stringWithUTF8String:[responseData bytes]];

is not the same as,

NSString *content = [[NSString alloc]  initWithBytes:[responseData bytes]
              length:[responseData length] encoding: NSUTF8StringEncoding];

the first expects a NULL terminated byte string, the second doesn't. In the above two cases content will be NULL in the first example if the byte string isn't correctly terminated.

Alasdair Allan
+1 I was getting killed on an error I simply could not figure out. Thank you for the insightful comment. You saved me from hours of frustration.
hyuan
Excellent point. I've edited my own answer to cover it; it is a critical detail, so I apologize for omitting it.
Peter Hosey
+3  A: 

A nice quick and dirty approach is to use NSString's stringWithFormat initializer to help you out. One of the less-often used features of string formatting is the ability to specify a mximum string length when outputting a string. Using this handy feature allows you to convert NSData into a string pretty easily:

NSData *myData = [self getDataFromSomewhere];
NSString *string = [NSString stringWithFormat:@"%.*s", [myData length], [myData bytes]];

If you want to output it to the log, it can be even easier:

NSLog(@"my Data: %.*s", [myData length], [myData bytes]);
Ethan