Registry access is quite complex. You are basically reading a large binary tree. The class design should rely heavily on the stored data structures. Only then you can choose an appropriate class design. To stay flexible you should model the primitives such as REG_SZ, REG_EXPAND_SZ, DWORD, SubKey, .... Don Syme has in his book Expert F# a nice section about binary parsing with binary combinators. The basic idea is that your objects know by themself how to deserialize from a binary representation. When you have a stream of bytes which is structured like this
<Header>
<Node1/>
<Node2>
<Directory1>
</Node2>
</Header>
you start with a BinaryReader to read the binary objects byte by byte. Since you know that the first thing must be the header you can pass it to the Header object
public class Header
{
static Header Deserialize(BinaryReader reader)
{
Header header = new Header();
int magic = reader.ReadByte();
if( magic == 0xf4 ) // we have a node entry
header.Insert(Node.Read( reader );
else if( magic == 0xf3 ) // directory entry
header.Insert(DirectoryEntry.Read(reader))
else
throw NotSupportedException("Invalid data");
return header;
}
}
To stay performant you can e.g. delay parsing the data up to a later time when specific properties of this or that instance are actually accessed.
Since the registry in Windows can get quite big it is not possible to read it completely into memory at once. You will need to chunk it. One solution that Windows applies is that the whole file is allocated in paged pool memory which can span several gigabytes but only the actually accessed parts are swapped out from disk into memory. That allows Windows to deal with a very large registry file in an efficient manner. You will need something similar for your reader as well. Lazy parsing is one aspect and the ability to jump around in the file without the need to read the data in between is cruical to stay performant.
More infos about paged pool and the registry can be found there:
http://blogs.technet.com/b/markrussinovich/archive/2009/03/26/3211216.aspx
Your Api design will depend on how you read the data to stay efficient (e.g. use a memory mapped file and read from different mapped regions). With .NET 4 a Memory Mapped file implementation has arrived that is quite good now but wrappers around the OS APIs exist as well.
Yours,
Alois Kraus
To support delayed loading from a memory mapped file it would make sense not to read the byte array into the object and parse it later but go one step furhter and store only the offset and length of the memory chunk from the memory mapped file. Later when the object is actually accessed you can read and deserialize the data. This way you can traverse the whole file and build a tree of objects which contain only the offsets and the reference to the memory mapped file. That should save huge amounts of memory.