views:

1806

answers:

3

What is the best way to search a large binary file for a certain substring in C#?

To provide some specifics, I'm trying to extract the DWARF information from an executable, so I only care about certain parts of the binary file (namely the sections starting with the strings .debug_info, .debug_abbrev, etc.)

I don't see anything obvious in Stream, FileStream, or BinaryReader, so it looks like I'll have to read chunks in and search through the data for the strings myself.

Is there a better way?

A: 

I think you'll have to do it yourself, BinaryReader was not designed for searching for text in a binary file. However, you should be mindful of the text encoding you use when searching.

Igor Brejc
+1  A: 

There must be a DWARF C library you could compile and use interop with? I did some searching and found this. If a library from there could be compiled into a DLL on Windows (I assume you're using Windows), then you could use System.Runtime.InteropServices to interact with the DLL and extract your information from there.

Perhaps?

yodaj007
Yes, it's better to properly parse the binary file format.
Craig McQueen
+4  A: 

There's nothing built into .NET that will do the search for you, so you're going to need to read in the file chunk by chunk and scan for what you want to find.

You can speed up the search in two ways.

Firstly, use bufferred IO and transfer large chunks at a time - don't read byte by byte, read 64KB, 256KB or 1MB chunks.

Secondly, don't do a linear scan for the piece you want - check out the Boyer-Moore (wikipedia link) algorithm for string searches - you can apply this to searching for the DWARF information you want.

Bevan