views:

416

answers:

10

I have been investigating for some time now a way to prevent my user from accidently entering a data directory of my application.

My application uses a folder to store a structured project. The folder internal structure is critic and should not be messed up. I would like my user to see this folder as a whole and not be able to open it (like a Mac bundle).

Is there a way to do that on Windows?

Edit from current answers

Of course I am not trying to prevent my users from accessing their data, just protecting them from accidentally destroying the data integrity. So encryption or password protection are not needed.

Thank you all for your .Net answers but unfortunately, this is mainly a C++ project without any dependency to the .Net framework.

The data I am mentioning are not light, they are acquired images from an electronic microscope. These data can be huge (~100 MiB to ~1 GiB) so loading everything in memory is not an option. These are huge images so the storage must provide a way to read the data incrementally by accessing one file at a time without loading the whole archive in memory.

Besides, the application is mainly legacy with some components we are not even responsible of. A solution that allows me to keep the current IO code is preferable.

Shell Extension looks interesting, I will investigate the solution further.

LarryF, can you elaborate on Filter Driver or DefineDOSDevice ? I am not familiar with these concepts.

+1  A: 

Inside, or outside of your program?

There are ways, but none of them easy. You are probably going to be looking at a Filter Driver on the file system.

Larry

LarryF
A: 

You could wrap your project directory into a .zip file and store your data there, much like a .jar is used (I know a .jar is pretty much read only, it's for sake of the example). Make a non-standard extension so that a double-click has no immediate effect, done. ;-)

Of course this means you would have to wrap all your file IO to use the .zip instead, depending on how your program is built this could be tedious. It's been done already for Java: TrueZip. Maybe you can use that as an inspiration?

If you have been tempted - I would not recommend fiddling with folder permissions, for obvious reasons this is not going to help.

Tomalak
+1  A: 

There are a couple things that you could do:

One thing is that you could create a FolderView Windows Shell Extension that would create a custom view for your critical folder. By creating a custom FolderView you could make the folder just blank white with one line of text "Nothing to see here", or you could do something more complication like the GAC viewer which uses this same method. This method would be fairly complex, but this complexity can be mitigated by using something like this CodeProject article's library as a base.

Another solution would be to do ZIP Virtual Filesystem, this would require you to replace any code that uses System.IO directly to use something else. ASP.NET 2.0 did this for this exact reason and you could build ontop of that pretty easily, take a look at this MSDN Article on implementing a VirtualPathProvider.

joshperry
It looks like this is mostly for ASP.NET, but he didn't mention anything about ASP, or .NET... The ideas are good, however.
LarryF
+1  A: 

You could use Isolated Storage.

http://www.ondotnet.com/pub/a/dotnet/2003/04/21/isolatedstorage.html

It doesn't solve all of the problems, but it does put app data well out of harm's way.

Andrew Rollings
A: 

Keep in mind: if you store it in the File system, the user will ALWAYS be able to see it. Tamper with explorer and i use cmd.exe instead. Or Total Commander. Or anything else.

If you don't want people to mess with your files, I'd recommend

  • encrypting them to prevent tampering with the files
  • putting them in an archive (i.e. ZIP), possibly password-protecting it, and then compressing/uncompressing at runtime (i would look up algorithms who are fast at modifying archives)

That is not full protection of course, but it's rather straight-forward to implement, does not require you to install funky stuff inside the operating system and should keep away most curious users.

Of course, you will never ever be able to fully control files on a users computer without controlling the computer itself.

Michael Stum
A: 

I've seen software (Visual Paradigm's Agilian) that used Tomalak's suggestion of a zip archive as the 'project file'. Zip files are well understood, and the use of a non-standard file extension does prevent the casual user from messing with the 'file'. One big advantage to this is that in the event of corruption, standard tools can be used to fix the problem, and you don't have to worry about creating special tools to support your main application.

Harper Shelby
+1  A: 

If you take the ZIP file aproach, (which I considered for you, but didn't mention it) I would suggest using the deflate algorithm, but use your own File System... Look at something like the TAR format. Then, just write your code to pass ALL I/O past the Inflate/Deflate algorithms as they get written to disk. I wouldn't use the ZIP "FORMAT", as it's far too easy look at the file, find the PK as the first two bytes, and unzip your file....

Just IMHO...

I like Joshperry's suggestions best.

Of course, you can also write a device driver that stores all your data inside a single file, but again, we're looking at a driver. (I'm not certain you could implement it outside of a driver.. You PROBABLY can, and inside your program call DefineDOSDevice, giving it a name that only your code has access to, and it will be seen as a normal file system.). I'll play with some ideas, and if they work, I'll shoot you a sample. Now you got me interested... :)

Larry

LarryF
A: 

I'm glad to hear you are doing this in C++. It seems that no one see's C++ as "necessary" anymore. It's all C# this, and ASP.NET that... Even I work in an all C# house, when I swore I would ever switch, as C++ does everything I'd ever need to do and then some. I'm adult enough to clean up my own memory, ya know? Anyways, back to the issue at hand...

The DefineDOSDevice is a method that you use to assign drive letters, port names (LPT1, COM1, etc). You pass it a name, some flags and a "path" that handles this device. But, don't let that fool you. It's not a File System path, it's an NT Object path. I'm sure you've seen them as "\Device\HardDisk0", etc. You can use WinObj.exe from sysinternals to see what I mean. Anyways, you can create a device driver, and then point an MSDOS Symlink to it, and you are off and running. But granted that seems like a lot of work for that original problem is.

How many of these meg to gigabyte files are in a typical directory? You might be best off just sticking all the files inside of one giant file, and store an index file right next to it, (or a header to each file) that points to the next "File" inside your "virtual FileSystem" file.

A good example might be to look at the Microsoft MSN Archive format. I reversed this archive format when I was working for an AV company, and it's actually pretty creative, yet VERY simple. It can be done all in one file, and if you want to get fancy, you COULD store the data across 3 files in a RAID 5 type configuration, so if any one of the 3 files gets hosed, you COULD rebuild the others. Plus, the users would just see 3 VERY large files in a directory, and would not be able to access the individual (inner) files.

I have provided you with code that unpacks one of these MSN Archive formats. I don't have code that CREATES one, but from the extract source, you'd be able to construct/write one with no problems. If files are deleted, and/or renamed often, that might pose a problem with used space in the file that would have to be trimmed from time to time.

This format even supports CRC fields, so you can test if you got the file out OK. I was never able to fully reverse the algorithm that Microsoft used to CRC the data, but I have a pretty goodd idea.

You wouldn't be able to keen current I/O routines, meaning CreateFile() would not just be able to open up any file in the archive, however, with the uber-coolness of C++, you could override the CreateFile call to implement your archive format.

If you want some help with his, and it's a big enough issue, perhaps we could talk off-line and find a solution for you.

I'm not opposed to writing you a FileSystemDriver, but for that, we'd have to start talking about compensation. I would be more than happy to give you direction and ideas for free, just as I'm doing now.

I'm not sure it's kosher for me to give you my email address on here, I'm not sure how SO's policies are on this, since we could be talking about potential work/solicitation, but that's not my sole intention. I'd rather help you find your own solutions first.

Before you look into a device driver, download the WinDDK. It has driver samples all over it.

If you wonder why I care so much about this, is because I've had on my slate for years to write a driver similar to this, that had to be Windows AND MAC compitable, that allowed users to secure drive volumes (USB Keys) WITHOUT installing any drivers, or complicated (and bulky, sometimes annoying) software. In recent years, a lot of the hardware manufacturers have done similar things, but I don't think the security is all that secure. I'm looking it using RSA and AES, exactly the same way GPG, and PGP work. Originally I was contacted about it for what (I believe, but have no proof) was going to be used to secure MP3 files. Since they'd be stored in encrypted format, they simply wouldn't work without the correct passphrase. But, I saw other uses for it too. :) (This was back with a 16 meg USB Key oh, I dunno in excess of $100 or so).

This project also went along with my Oil and Gas industry PC seccurity system that used something similar to Smart Cards, just much easier to use, re-use/re-issue, impossible to hack, and I could use use it on my own kids at home, (since there is always fighting over who gets time on the computer, and how got the most, and on, and on, and on, and...), and cheaper, (with all the small USB keys coming out of China for pennies on the dollar).

Phew.. I think I got way off topic here. Anyways, here is an example of the Microsoft MSN archive format. See if you may be able to use something like this, knowing that you can always "Skip" right to a file by folloing the offsets in the file as you parse/search for the requested file in the master file; or in the pre-parsed data held in memory. And since you would not be loading the raw binary file data in memory, your only limit would probably be the 4gb file limit. (If that even still exists in NTFS5, and XP/Vista/2008, etc).

The MARC (Microsoft MSN Archive) format is laid out like this: 12 Byte Header (only one) (File Magic, MARC version, and Number of files in the following table) 68 Byte File Table headers (1 to Header.NumFiles of these) (File name, File Size, Checksum, offset to raw file data)

Now, in the 12 Byte File Table entries, 32 bits are used for file lengths, and offsets. For your VERY large files, you may have to up that to 48 or 64 bit integers.

Here is some code I wrote up to handle these.

#define MARC_FILE_MAGIC         0x4352414D // In Little Endian
#define MARC_FILENAME_LEN       56 //(You'll notice this is rather small)
#define MARC_HEADER_SIZE        12
#define MARC_FILE_ENT_SIZE      68

#define MARC_DATA_SIZE          1024 * 128 // 128k Read Buffer should be enough.

#define MARC_ERR_OK              0      // No error
#define MARC_ERR_OOD             314    // Out of data error
#define MARC_ERR_OS              315    // Error returned by the OS
#define MARC_ERR_CRC             316    // CRC error

struct marc_file_hdr
{
    ULONG            h_magic;
    ULONG            h_version;
    ULONG            h_files;
    int              h_fd;
    struct marc_dir *h_dir;
};

struct marc_file
{
    char            f_filename[MARC_FILENAME_LEN];
    long            f_filesize;
    unsigned long   f_checksum;
    long            f_offset;
};

struct marc_dir
{
    struct marc_file       *dir_file;
    ULONG                   dir_filenum;
    struct marc_dir        *dir_next;
};

That gives you an idea of the headers I wrote for them, and here is the open function. Yes, it's missing all the support calls, err routines, etc, but you get the idea. Please excuse the C and C++ code style mixture. Our scanner was a cluster of many different problems like this... I used the antique calls like open(), fopen(), to keep standards with the rest of the code base.

struct marc_file_hdr *marc_open(char *filename)
{
    struct marc_file_hdr *fhdr  = (struct marc_file_hdr*)malloc(sizeof(marc_file_hdr));
    fhdr->h_dir = NULL;

#if defined(_sopen_s)
    int errno = _sopen_s(fhdr->h_fd, filename, _O_BINARY | _O_RDONLY, _SH_DENYWR, _S_IREAD | _S_IWRITE);
#else
    fhdr->h_fd = open(filename, _O_BINARY | _O_RDONLY);
#endif
    if(fhdr->h_fd < 0)
    {
        marc_close(fhdr);
        return NULL;
    }

    //Once we have the file open, read all the file headers, and populate our main headers linked list.
    if(read(fhdr->h_fd, fhdr, MARC_HEADER_SIZE) != MARC_HEADER_SIZE)
    {
        errmsg("MARC: Could not read MARC header from file %s.\n", filename);
        marc_close(fhdr);
        return NULL;
    }

    // Verify the file magic
    if(fhdr->h_magic != MARC_FILE_MAGIC)
    {
        errmsg("MARC: Incorrect file magic %x found in MARC file.", fhdr->h_magic);
        marc_close(fhdr);
        return NULL;
    }

    if(fhdr->h_files <= 0)
    {
        errmsg("MARC: No files found in archive.\n");
        marc_close(fhdr);
        return NULL;
    }

    // Get all the file headers from this archive, and link them to the main header.
    struct marc_dir *lastdir = NULL, *curdir = NULL;
    curdir = (struct marc_dir*)malloc(sizeof(marc_dir));
    fhdr->h_dir = curdir;

    for(int x = 0;x < fhdr->h_files;x++)
    {
        if(lastdir)
        {
            lastdir->dir_next = (struct marc_dir*)malloc(sizeof(marc_dir));
            lastdir->dir_next->dir_next = NULL;
            curdir = lastdir->dir_next;
        }

        curdir->dir_file = (struct marc_file*)malloc(sizeof(marc_file));
        curdir->dir_filenum = x + 1;

        if(read(fhdr->h_fd, curdir->dir_file, MARC_FILE_ENT_SIZE) != MARC_FILE_ENT_SIZE)
        {
            errmsg("MARC: Could not read file header for file %d\n", x);
            marc_close(fhdr);
            return NULL;
        }
        // LEF: Just a little extra insurance...
        curdir->dir_file->f_filename[MARC_FILENAME_LEN] = NULL;

        lastdir = curdir;
    }
    lastdir->dir_next = NULL;

    return fhdr;
}

Then, you have the simple extract method. Keep in mind this was strictly for virus scanning, so there are no search routines, etc. it simply dumped a file out, scanned it, and moved on, Below this is the CRC code routine that I BELIVE Microsoft used, but I'm not sure WHAT they CRC'd. It might include header data + file data, etc.. I just haven't cared enough to go back and try to reverse it. Anyways, as you can see, there is no compression on this archive format, but it is VERY easy to add... Full source can be provided if you want. (Hell I think all that's left is the close() routine, and the code that calls and extracts each file, etc.. :)

bool marc_extract(struct marc_file_hdr *marc, struct marc_file *marcfile, char *file, int &err)
{
    // Create the file from marcfile, in *file's location, return any errors.
    int ofd = 0;
#if defined(_sopen_s)
     err = _sopen_s(ofd, filename, _O_CREAT | _O_SHORT_LIVED | _O_BINARY | _O_RDWR, _SH_DENYNO, _S_IREAD | _S_IWRITE);
#else
    ofd = open(file, _O_CREAT | _O_SHORT_LIVED | _O_BINARY | _O_RDWR);
#endif

    // Seek to the offset of the file to extract
    if(lseek(marc->h_fd, marcfile->f_offset, SEEK_SET) != marcfile->f_offset)
    {
        errmsg("MARC: Could not seek to offset 0x%04x for file %s.\n", marcfile->f_offset, marcfile->f_filename);
        close(ofd);
        err = MARC_ERR_OS; // Get the last error from the OS.
        return false;
    }

    unsigned char *buffer = (unsigned char*)malloc(MARC_DATA_SIZE);

    long bytesleft = marcfile->f_filesize;
    long readsize = MARC_DATA_SIZE >= marcfile->f_filesize ? marcfile->f_filesize : MARC_DATA_SIZE;
    unsigned long crc = 0;

    while(bytesleft)
    {
        if(read(marc->h_fd, buffer, readsize) != readsize)
        {
            errmsg("MARC: Failed to extract data from MARC archive.\n");
            free(buffer);
            close(ofd);
            err = MARC_ERR_OOD;
            return false;
        }

        crc = marc_checksum(buffer, readsize, crc);

        if(write(ofd, buffer, readsize) != readsize)
        {
            errmsg("MARC: Failed to write data to file.\n");
            free(buffer);
            close(ofd);
            err = MARC_ERR_OS; // Get the last error from the OS.
            return false;
        }
        bytesleft -= readsize;
        readsize = MARC_DATA_SIZE >= bytesleft ? bytesleft : MARC_DATA_SIZE;
    }

    // LEF:  I can't quite figure out how the checksum is computed, but I think it has to do with the file header, PLUS the data in the file, or it's checked on the data in the file
    //       minus any BOM's at the start...  So, we'll just rem out this code for now, but I wrote it anyways.
    //if(crc != marcfile->f_checksum)
    //{
    //    warningmsg("MARC: File CRC does not match.  File could be corrupt, or altered.  CRC=0x%08X, expected 0x%08X\n", crc, marcfile->f_checksum);
    //    err = MARC_ERR_CRC;
    //}

    free(buffer);
    close(ofd);

    return true;
}

Here is MY assumed CRC routine (I may have stole this from Stuart Caie and libmspack, I can't recall):

static unsigned long marc_checksum(void *pv, UINT cb, unsigned long seed)
{
    int count = cb / 4;
    unsigned long csum = seed;
    BYTE *p = (BYTE*)pv;
    unsigned long ul;

    while(count-- > 0)
    {
        ul = *p++;
        ul |= (((unsigned long)(*p++)) <<  8);
        ul |= (((unsigned long)(*p++)) << 16);
        ul |= (((unsigned long)(*p++)) << 24);
        csum ^= ul;
    }

    ul = 0;
    switch(cb % 4)
    {
        case 3: ul |= (((unsigned long)(*p++)) << 16);
        case 2: ul |= (((unsigned long)(*p++)) <<  8);
        case 1: ul |= *p++;
        default: break;
    }
    csum ^= ul;

    return csum;
}

Well, I think this post is long enough now... Contact me if you need help, or have questions.

LarryF
A: 

Looks like some Windows ports of FUSE are starting to appear. I think this would be the best solution since it would allow me to keep the legacy code (which is quite large) untouched.

Vincent Robert
+1  A: 

Structured Storage was designed for the scenario that you describe:

Structured Storage provides file and data persistence in COM by handling a single file as a structured collection of objects known as storages and streams.

A "storage" is analogous to a folder, and a "stream" is analogous to a file. Basically you have a single file that, when accessed using the Structured Storage APIs, behaves and looks like a complete, self-contained file system.

Take note, though, that:

A solid understanding of COM technologies is prerequisite to the developmental use of Structured Storage.

Jay Michaud
This would be a great idea if I didn't have that much legacy code. I quickly looked at the API and it looks like you have to use specific COM interfaces (IStream) to read the content of "files" in your storage. So no way to use existing code with this solution.
Vincent Robert