views:

1238

answers:

5

Is it possible to cache a binary file in .NET and do normal file operations on cached file?

+1  A: 

Any modern OS has a caching system built in, so in fact whenever you interact with a file, you are interacting with an in-memory cache of the file.

Before applying custom caching, you need to ask an important question: what happens when the underlying file changes, so my cached copy becomes invalid?

You can complicate matters further if the cached copy is allowed to change, and the changes need to be saved back to the underlying file.

If the file is small, it's simpler just to use MemoryStream as suggested in another answer.

If you need to save changes back to the file, you could write a wrapper class that forwards everything on to MemoryStream, but additionally has an IsDirty property that it sets to true whenever a write operation is performed. Then you can have some management code that kicks in whenever you choose (at the end of some larger transaction?), checks for (IsDirty == true) and saves the new version to disk. This is called "lazy write" caching, as the modifications are made in memory and are not actually saved until sometime later.

If you really want to complicate matters, or you have a very large file, you could implement your own paging, where you pick a buffer size (maybe 1 MB?) and hold a small number of byte[] pages of that fixed size. This time you'd have a dirty flag for each page. You'd implement the Stream methods so they hide the details from the caller, and pull in (or discard) page buffers whenever necessary.

Finally, if you want an easier life, try:

http://www.microsoft.com/Sqlserver/2005/en/us/compact.aspx

It lets you use the same SQL engine as SQL Server but on a file, with everything happening inside your process instead of via an external RDBMS server. This will probably give you a much simpler way of querying and updating your file, and avoid the need for a lot of hand-written persistence code.

Daniel Earwicker
Isn't that what a memory-mapped file (http://en.wikipedia.org/wiki/Memory-mapped_file) is? Even so, I tink the OP wants to close the file handle as soon as possible.
Noldorin
Memory-mapping a file is where the OS uses a file (of your choice) to provide the virtual memory backing store for a region of the process's address space. (The page file serves this purpose for normally allocation memory.) I'm talking about the fact that the OS has disk caching that operates regardless of how you access the file. Try using grep or similar to search a few hundred MB of text files. The second time you do it, it will happen a lot faster and your hard drive won't make a sound, because it's all in memory.
Daniel Earwicker
@Earwicker: Yeah, I'm sure you're right. Nonetheless, copying the contents into a MemoryStream does seem to be the best solution here because a) it doesn't maintain a lock on the file b) I suspect it will still offer performance gains.
Noldorin
+3  A: 

Well, you can of course read the file into a byte[] array and start working on it. And if you want to use a stream you can copy your FileStream into a MemoryStream and start working with it - like:

public static void CopyStream( Stream input, Stream output )
{
        var buffer = new byte[32768];
        int readBytes;
        while( ( readBytes = input.Read( buffer, 0, buffer.Length ) ) > 0 )
        {
                output.Write( buffer, 0, readBytes );
        }
}

If you are concerned about performance - well, normally the build-in mechanisms of the different file access methods should be enough.

tanascius
+5  A: 

The way to do this is to read the entire contents from the FileStream into a MemoryStream object, and then use this object for I/O later on. Both types inherit from Stream, so the usage will be effectively identical.

Here's an example:

private MemoryStream cachedStream;

public void CacheFile(string fileName)
{
    cachedStream = new MemoryStream(File.ReadAllBytes(fileName));
}

So just call the CacheFile method once when you want to cache the given file, and then anywhere else in code use cachedStream for reading. (The actual file will been closed as soon as its contents was cached.) Only thing to remember is to dispose cachedStream when you're finished with it.

Noldorin
+1: I think this might actually be what the asker wants.
Binary Worrier
It will probably be fine - the only issue would be if we're talking about a file that has a size of a GB or two.
Daniel Earwicker
Yeah, this method does of course cease to be useful when the file size approachs that of the RAM. By that point, you should however be using a database server, so I assume this won't be an issue here.
Noldorin
A: 

I don't know what exactly you're doing, but I offer this suggestion (which may or may not be viable depending on what you're doing):

Instead of only caching the contents of the file, why don't you put the contents of the file in a nice strongly typed collection of items, and then cache that? It'll probably make searching for items a bit easier, and faster since there is no parsing involved.

Giovanni Galbo
file contains alot of records. it is actually maxmind country database binary file
from that can we assume that the real problem is that you are not getting the performance you would like from your queries?
Sam Holder
A: 

There is a very elegant caching system in Lucene that caches bytes from the disk into memory and intelligently updates the store etc. You might want to have a look at that code to get an idea of how they do it. You might also want to read up on the Microsoft SQL Server data storage layer - as the MSSQL team is pretty forthcoming about some of the more crucial implementation details.

Jonathan C Dickinson