I am developing an application and I would like to be able to search the whole drive for a regular expression. I would prefer to do this in c# but I can call other languages. Is there any easy way to just seek through all the binary data on a drive from begining to end?
AFAIK there is no simple way to do this on raw binary data (You would need direct disk control).
If file-basis is enough enumerating all files, opening them for binary shared reading (catch the exceptions for the ones that are system protected) and then looking for the data should be straightforward. However this will be quite slow as enumerating and opening all files will take some time.
I don't think C# can read all files / data for the drive the OS is on, since the OS locks some files.
You could use the System.IO namespace to enumerate all files, and then scan them individually byte by byte, this obviously would take a long time.
Here's an implementation of grep in C#
http://dotnet.jku.at/applications/Grep/Src.aspx
You can modify to follow subdirectories -- it works off of an array of filenames.
Do you really want to do this ? How are you going to search:
- .doc
- .xls
- .html
etc.? Each file type will represent the string you're searching for in different ways.
This article shows how to read data directly from the disk. Everything they do from C++ could be done from C# using PInvoke.