views:

562

answers:

4

I have a binary file I've loaded using an NSData object. Is there a way to locate a sequence of characters, 'abcd' for example, within that binary data and return the offset without converting the entire file to a string? Seems like it should be a simple answer, but I'm not sure how to do it. Any ideas?

Thanks.

UPDATE: I'm doing this on the iPhone so I don't have -rangeOfData:options:range: available.

UPDATE 2: I'm going to award this one to Sixteen Otto for suggesting strstr. I went and found the source code for the C function strstr and rewrote it to work on a fixed length Byte array--which incidentally is different from a char array as it is not null terminated. Here is the code I ended up with:

- (Byte*)offsetOfBytes:(Byte*)bytes inBuffer:(const Byte*)buffer ofLength:(int)len;
{
    Byte *cp = bytes;
    Byte *s1, *s2;

    if ( !*buffer )
        return bytes;

    int i = 0;
    for (i=0; i < len; ++i)
    {
        s1 = cp;
        s2 = (Byte*)buffer;

        while ( *s1 && *s2 && !(*s1-*s2) )
            s1++, s2++;

        if (!*s2)
            return cp;

        cp++;
    }

    return NULL;
}

This returns a pointer to the first occurrence of bytes, the thing I'm looking for, in buffer, the byte array that should contain bytes.

I call it like this:

// data is the NSData object
const Byte *bytes = [data bytes];
Byte* index = [self offsetOfBytes:tag inBuffer:bytes ofLength:[data length]];
+2  A: 

Did you look into the bytes method?

Liz Albin
+4  A: 

Convert your substring to an NSData object, and search for those bytes in the larger NSData using rangeOfData:options:range:. Make sure that the string encodings match!

On iPhone, where that isn't available, you may have to do this yourself. The C function strstr() will give you a pointer to the first occurrence of a pattern within the buffer (as long as neither contain nulls!), but not the index. Here's a function that should do the job (but no promises, since I haven't tried actually running it...):

- (NSUInteger)indexOfData:(NSData*)needle inData:(NSData*)haystack
{
    const void* needleBytes = [needle bytes];
    const void* haystackBytes = [haystack bytes];

    // walk the length of the buffer, looking for a byte that matches the start
    // of the pattern; we can skip (|needle|-1) bytes at the end, since we can't
    // have a match that's shorter than needle itself
    for (NSUInteger i=0; i < [haystack length]-[needle length]+1; i++)
    {
        // walk needle's bytes while they still match the bytes of haystack
        // starting at i; if we walk off the end of needle, we found a match
        NSUInteger j=0;
        while (j < [needle length] && needleBytes[j] == haystackBytes[i+j])
        {
            j++;
        }
        if (j == [needle length])
        {
            return i;
        }
    }
    return NSNotFound;
}

This runs in something like O(nm), where n is the buffer length, and m is the size of the substring. It's written to work with NSData for two reasons: 1) that's what you seem to have in hand, and 2) those objects already encapsulate both the actual bytes, and the length of the buffer.

Sixten Otto
I should have mentioned that I'm doing this on the iPhone which doesn't have the rangeofData:options:range: method. Would be a perfect answer if it did though.
Matt Long
Cool. I will try your code out and see how it goes. Thanks again for your help.
Matt Long
+1  A: 

If you're using Snow Leopard, a convenient way is the new -rangeOfData:options:range: method in NSData that returns the range of the first occurrence of a piece of data. Otherwise, you can access the NSData's contents yourself using its -bytes method to perform your own search.

Preston
Good point. I hadn't noticed that -rangeOfData:options:range: was only added in 10.6.
Sixten Otto
So I don't have that method available since I'm doing this on the iPhone. What C functions would you use to compare the character substring I'm looking for to the buffer I get from the -bytes method? Any ideas?
Matt Long
+1  A: 

I had the same problem. I solved it doing the other way round, compared to the suggestions.

first, I reformat the data (assume your NSData is stored in var rawFile) with:

NSString *ascii = [[NSString alloc] initWithData:rawFile encoding:NSAsciiStringEncoding];

Now, you can easily do string searches like 'abcd' or whatever you want using the NSScanner class and passing the ascii string to the scanner. Maybe this is not really efficient, but it works until the -rangeOfData method will be available for iPhone also.

Andy
Thanks for your response. One of my criteria stated in the question was "without converting the entire file to a string" so this is not a viable solution for me. Check my original question now to see the solution I came up with. It works well without having to copy any data at all. I just iterate over the bytes from the NSData object looking for the character sequence I need and then return a pointer to that position in the array upon finding the first occurrence.
Matt Long
Yes I see.The real point would be to understand the cost of such conversion, I don't really have a clue on this. It could be useful to ask to Apple abou this... have to start looking in their forums also. :-)
Andy