I'm glad to hear you are doing this in C++. It seems that no one see's C++ as "necessary" anymore. It's all C# this, and ASP.NET that... Even I work in an all C# house, when I swore I would ever switch, as C++ does everything I'd ever need to do and then some. I'm adult enough to clean up my own memory, ya know? Anyways, back to the issue at hand...
The DefineDOSDevice is a method that you use to assign drive letters, port names (LPT1, COM1, etc). You pass it a name, some flags and a "path" that handles this device. But, don't let that fool you. It's not a File System path, it's an NT Object path. I'm sure you've seen them as "\Device\HardDisk0", etc. You can use WinObj.exe from sysinternals to see what I mean. Anyways, you can create a device driver, and then point an MSDOS Symlink to it, and you are off and running. But granted that seems like a lot of work for that original problem is.
How many of these meg to gigabyte files are in a typical directory? You might be best off just sticking all the files inside of one giant file, and store an index file right next to it, (or a header to each file) that points to the next "File" inside your "virtual FileSystem" file.
A good example might be to look at the Microsoft MSN Archive format. I reversed this archive format when I was working for an AV company, and it's actually pretty creative, yet VERY simple. It can be done all in one file, and if you want to get fancy, you COULD store the data across 3 files in a RAID 5 type configuration, so if any one of the 3 files gets hosed, you COULD rebuild the others. Plus, the users would just see 3 VERY large files in a directory, and would not be able to access the individual (inner) files.
I have provided you with code that unpacks one of these MSN Archive formats. I don't have code that CREATES one, but from the extract source, you'd be able to construct/write one with no problems. If files are deleted, and/or renamed often, that might pose a problem with used space in the file that would have to be trimmed from time to time.
This format even supports CRC fields, so you can test if you got the file out OK. I was never able to fully reverse the algorithm that Microsoft used to CRC the data, but I have a pretty goodd idea.
You wouldn't be able to keen current I/O routines, meaning CreateFile() would not just be able to open up any file in the archive, however, with the uber-coolness of C++, you could override the CreateFile call to implement your archive format.
If you want some help with his, and it's a big enough issue, perhaps we could talk off-line and find a solution for you.
I'm not opposed to writing you a FileSystemDriver, but for that, we'd have to start talking about compensation. I would be more than happy to give you direction and ideas for free, just as I'm doing now.
I'm not sure it's kosher for me to give you my email address on here, I'm not sure how SO's policies are on this, since we could be talking about potential work/solicitation, but that's not my sole intention. I'd rather help you find your own solutions first.
Before you look into a device driver, download the WinDDK. It has driver samples all over it.
If you wonder why I care so much about this, is because I've had on my slate for years to write a driver similar to this, that had to be Windows AND MAC compitable, that allowed users to secure drive volumes (USB Keys) WITHOUT installing any drivers, or complicated (and bulky, sometimes annoying) software. In recent years, a lot of the hardware manufacturers have done similar things, but I don't think the security is all that secure. I'm looking it using RSA and AES, exactly the same way GPG, and PGP work. Originally I was contacted about it for what (I believe, but have no proof) was going to be used to secure MP3 files. Since they'd be stored in encrypted format, they simply wouldn't work without the correct passphrase. But, I saw other uses for it too. :) (This was back with a 16 meg USB Key oh, I dunno in excess of $100 or so).
This project also went along with my Oil and Gas industry PC seccurity system that used something similar to Smart Cards, just much easier to use, re-use/re-issue, impossible to hack, and I could use use it on my own kids at home, (since there is always fighting over who gets time on the computer, and how got the most, and on, and on, and on, and...), and cheaper, (with all the small USB keys coming out of China for pennies on the dollar).
Phew.. I think I got way off topic here. Anyways, here is an example of the Microsoft MSN archive format. See if you may be able to use something like this, knowing that you can always "Skip" right to a file by folloing the offsets in the file as you parse/search for the requested file in the master file; or in the pre-parsed data held in memory. And since you would not be loading the raw binary file data in memory, your only limit would probably be the 4gb file limit. (If that even still exists in NTFS5, and XP/Vista/2008, etc).
The MARC (Microsoft MSN Archive) format is laid out like this:
12 Byte Header (only one)
(File Magic, MARC version, and Number of files in the following table)
68 Byte File Table headers (1 to Header.NumFiles of these)
(File name, File Size, Checksum, offset to raw file data)
Now, in the 12 Byte File Table entries, 32 bits are used for file lengths, and offsets. For your VERY large files, you may have to up that to 48 or 64 bit integers.
Here is some code I wrote up to handle these.
#define MARC_FILE_MAGIC 0x4352414D // In Little Endian
#define MARC_FILENAME_LEN 56 //(You'll notice this is rather small)
#define MARC_HEADER_SIZE 12
#define MARC_FILE_ENT_SIZE 68
#define MARC_DATA_SIZE 1024 * 128 // 128k Read Buffer should be enough.
#define MARC_ERR_OK 0 // No error
#define MARC_ERR_OOD 314 // Out of data error
#define MARC_ERR_OS 315 // Error returned by the OS
#define MARC_ERR_CRC 316 // CRC error
struct marc_file_hdr
{
ULONG h_magic;
ULONG h_version;
ULONG h_files;
int h_fd;
struct marc_dir *h_dir;
};
struct marc_file
{
char f_filename[MARC_FILENAME_LEN];
long f_filesize;
unsigned long f_checksum;
long f_offset;
};
struct marc_dir
{
struct marc_file *dir_file;
ULONG dir_filenum;
struct marc_dir *dir_next;
};
That gives you an idea of the headers I wrote for them, and here is the open function. Yes, it's missing all the support calls, err routines, etc, but you get the idea. Please excuse the C and C++ code style mixture. Our scanner was a cluster of many different problems like this... I used the antique calls like open(), fopen(), to keep standards with the rest of the code base.
struct marc_file_hdr *marc_open(char *filename)
{
struct marc_file_hdr *fhdr = (struct marc_file_hdr*)malloc(sizeof(marc_file_hdr));
fhdr->h_dir = NULL;
#if defined(_sopen_s)
int errno = _sopen_s(fhdr->h_fd, filename, _O_BINARY | _O_RDONLY, _SH_DENYWR, _S_IREAD | _S_IWRITE);
#else
fhdr->h_fd = open(filename, _O_BINARY | _O_RDONLY);
#endif
if(fhdr->h_fd < 0)
{
marc_close(fhdr);
return NULL;
}
//Once we have the file open, read all the file headers, and populate our main headers linked list.
if(read(fhdr->h_fd, fhdr, MARC_HEADER_SIZE) != MARC_HEADER_SIZE)
{
errmsg("MARC: Could not read MARC header from file %s.\n", filename);
marc_close(fhdr);
return NULL;
}
// Verify the file magic
if(fhdr->h_magic != MARC_FILE_MAGIC)
{
errmsg("MARC: Incorrect file magic %x found in MARC file.", fhdr->h_magic);
marc_close(fhdr);
return NULL;
}
if(fhdr->h_files <= 0)
{
errmsg("MARC: No files found in archive.\n");
marc_close(fhdr);
return NULL;
}
// Get all the file headers from this archive, and link them to the main header.
struct marc_dir *lastdir = NULL, *curdir = NULL;
curdir = (struct marc_dir*)malloc(sizeof(marc_dir));
fhdr->h_dir = curdir;
for(int x = 0;x < fhdr->h_files;x++)
{
if(lastdir)
{
lastdir->dir_next = (struct marc_dir*)malloc(sizeof(marc_dir));
lastdir->dir_next->dir_next = NULL;
curdir = lastdir->dir_next;
}
curdir->dir_file = (struct marc_file*)malloc(sizeof(marc_file));
curdir->dir_filenum = x + 1;
if(read(fhdr->h_fd, curdir->dir_file, MARC_FILE_ENT_SIZE) != MARC_FILE_ENT_SIZE)
{
errmsg("MARC: Could not read file header for file %d\n", x);
marc_close(fhdr);
return NULL;
}
// LEF: Just a little extra insurance...
curdir->dir_file->f_filename[MARC_FILENAME_LEN] = NULL;
lastdir = curdir;
}
lastdir->dir_next = NULL;
return fhdr;
}
Then, you have the simple extract method. Keep in mind this was strictly for virus scanning, so there are no search routines, etc. it simply dumped a file out, scanned it, and moved on, Below this is the CRC code routine that I BELIVE Microsoft used, but I'm not sure WHAT they CRC'd. It might include header data + file data, etc.. I just haven't cared enough to go back and try to reverse it. Anyways, as you can see, there is no compression on this archive format, but it is VERY easy to add... Full source can be provided if you want. (Hell I think all that's left is the close() routine, and the code that calls and extracts each file, etc.. :)
bool marc_extract(struct marc_file_hdr *marc, struct marc_file *marcfile, char *file, int &err)
{
// Create the file from marcfile, in *file's location, return any errors.
int ofd = 0;
#if defined(_sopen_s)
err = _sopen_s(ofd, filename, _O_CREAT | _O_SHORT_LIVED | _O_BINARY | _O_RDWR, _SH_DENYNO, _S_IREAD | _S_IWRITE);
#else
ofd = open(file, _O_CREAT | _O_SHORT_LIVED | _O_BINARY | _O_RDWR);
#endif
// Seek to the offset of the file to extract
if(lseek(marc->h_fd, marcfile->f_offset, SEEK_SET) != marcfile->f_offset)
{
errmsg("MARC: Could not seek to offset 0x%04x for file %s.\n", marcfile->f_offset, marcfile->f_filename);
close(ofd);
err = MARC_ERR_OS; // Get the last error from the OS.
return false;
}
unsigned char *buffer = (unsigned char*)malloc(MARC_DATA_SIZE);
long bytesleft = marcfile->f_filesize;
long readsize = MARC_DATA_SIZE >= marcfile->f_filesize ? marcfile->f_filesize : MARC_DATA_SIZE;
unsigned long crc = 0;
while(bytesleft)
{
if(read(marc->h_fd, buffer, readsize) != readsize)
{
errmsg("MARC: Failed to extract data from MARC archive.\n");
free(buffer);
close(ofd);
err = MARC_ERR_OOD;
return false;
}
crc = marc_checksum(buffer, readsize, crc);
if(write(ofd, buffer, readsize) != readsize)
{
errmsg("MARC: Failed to write data to file.\n");
free(buffer);
close(ofd);
err = MARC_ERR_OS; // Get the last error from the OS.
return false;
}
bytesleft -= readsize;
readsize = MARC_DATA_SIZE >= bytesleft ? bytesleft : MARC_DATA_SIZE;
}
// LEF: I can't quite figure out how the checksum is computed, but I think it has to do with the file header, PLUS the data in the file, or it's checked on the data in the file
// minus any BOM's at the start... So, we'll just rem out this code for now, but I wrote it anyways.
//if(crc != marcfile->f_checksum)
//{
// warningmsg("MARC: File CRC does not match. File could be corrupt, or altered. CRC=0x%08X, expected 0x%08X\n", crc, marcfile->f_checksum);
// err = MARC_ERR_CRC;
//}
free(buffer);
close(ofd);
return true;
}
Here is MY assumed CRC routine (I may have stole this from Stuart Caie and libmspack, I can't recall):
static unsigned long marc_checksum(void *pv, UINT cb, unsigned long seed)
{
int count = cb / 4;
unsigned long csum = seed;
BYTE *p = (BYTE*)pv;
unsigned long ul;
while(count-- > 0)
{
ul = *p++;
ul |= (((unsigned long)(*p++)) << 8);
ul |= (((unsigned long)(*p++)) << 16);
ul |= (((unsigned long)(*p++)) << 24);
csum ^= ul;
}
ul = 0;
switch(cb % 4)
{
case 3: ul |= (((unsigned long)(*p++)) << 16);
case 2: ul |= (((unsigned long)(*p++)) << 8);
case 1: ul |= *p++;
default: break;
}
csum ^= ul;
return csum;
}
Well, I think this post is long enough now... Contact me if you need help, or have questions.