Hi,
I'm working on a java web application that uses thousands of small files to build artifacts in response to requests. I think our system could see performance improvements if we could map these files into memory rather than run all over the disk to find them all the time.
I have heard of mmap in linux, and my basic understanding of that concept is that when a file is read from disk the file's contents get cached somewhere in memory for quicker subsequent access. What I have in mind is similar to that idea, except I'd like to read the whole mmap-able set of files into memory as my web app is initializing for minimal request-time responses.
One aspect of my thought-train here is that we'd probably get the files into jvm memory faster if they were all tarred up and somehow mounted in the JVM as a virtual file system. As it stands it can take several minutes for our current implementation to walk through the set of source files and just figure out what all is on the disk.. this is because we're essentially doing file stats for upwards of 300,000 files.
I have found the apache VFS project which can read information from a tar file, but I'm not sure from their documentation if you can specify something such as "also, read the entire tar into memory and hold it there..".
We're talking about a multithreaded environment here serving artifacts that usually piece together about 100 different files out of a complete set of 300,000+ source files to make one response. So whatever the virtual file system solution is, it needs to be thread safe and performant. We're only talking about reading files here, no writes.
Also, we're running a 64 bit OS with 32 gig of RAM, our 300,000 files take up about 1.5 to 2.5 gigs of space. We can surely read a 2.5 gigabyte file into memory much quicker than 300K small several-kilobyte-sized files.
Thanks for input!
- Jason