views:

62

answers:

2

I would like to store up-to 10M files, 2TB storage unit. The only properties which I need restricted to filenames, and their contents (data).

The files max-length is 100MB, most of them are less than 1MB. The ability of removing files is required, and both writing and reading speeds should be a priority - while low storage efficiency, recovery or integrity methods, are not needed.

I thought about NTFS, but most of its features are not needed, while can't be disabled and considered to be an overhead concern, a few of them are: creation date, modification date, attribs, journal and of course permissions.

Due to the native features of a filesystem which are not needed, would you suggest I'll use SQLITE for this requirement? or there's an obvious disadvantage that I should be aware about? (one would guess that removing files will be a complicated task?)

(SQLITE will be via the C api)

My goal is to use a more suited solution to gain performance. Thanks in advance - Doori Bar

+1  A: 

Use the file system. That's what it's for. Any DBMS will have to eventually write the files to the file system anyway, and since you are looking at SQLITE, it's unlikely you are considering a DBMS for extra features it offers over and above the file system.

Andrew Barber
I'm rather surprised. I wouldn't think a filesystem will be recommended while having such specific requirement - versus all the features of native NTFS operation. Nevertheless the addressing of the fact SQLITE will use the filesystem, has no meaning. The concerns are strictly about per-file features and properties - which obviously SQLITE won't share due to its structure. Thanks - I'll wait for more people to share their opinion.
Doori Bar
+3  A: 

If your main requirement is performance, go with native file system. DBMS are not well suited for handling large BLOBs, so SQLite is not an option for you at all (don't even know why everybody considers SQLite to be a plug for every hole).

To improve performance of NTFS (or any other file system you choose) don't put all files into single folder, but group files by first N characters of their file names, or also by extension. Eg. file name "song1.mp3" would look like "\mp3\so\ng1" on the disk. This approach would let you get rid of long directory listing which is slow to enumerate when accessing the file.

Also there exist some other file systems on the market and maybe some of them offer possibility to disable some of used features. You can check the comparison on Wikipedia and check them.

Our company also offers a custom file system, SolFS, yet I don't recommend it because it's strong points are features, and you don't need features.

Eugene Mayevski 'EldoS Corp
May I ask what is your definition of a large blob?
Doori Bar
In fact, any blob larger than page size (check DBMS manual for page size-relaed details) can be considered large. This is because when the data doesn't fit into the page, the procedure of storing it becomes more complicated than the procedure of handling short variable-size data. AFAIK some DBMS also store such blobs as files on the file system. This is very much similar to what Microsoft recommends for registry -- "you can store var-sized binary blocks in registry, but for blocks over 2 Kb put such blocks to files and keep a reference in registry".
Eugene Mayevski 'EldoS Corp
So if most of the files in question don't go behind 1MB, and I set a page file of 1MB - you'd recommend SQLITE over a filesystem? (SQLITE has one file structure)
Doori Bar
I said that SQLite was not an option AT ALL. The only thing it gives you is unnecessary overhead.
Eugene Mayevski 'EldoS Corp
Then I'm sorry - I believed you meant that only large blobs are considered to be an issue for such a database.
Doori Bar
I don't think SQLite has a page of 1 Mb (it would be very inefficient to manage the file), but I don't know. Even with one file SQLite gives an overhead. The backend storage of single-file databases is a virtual file system, similar to our SolFS or CodeBase File System. And you end up dealing with two filesysems - a virtual filesystem sitting in the file which resides on a real filesystem. This is where the overhead appears (not saying about DB structure overhead).
Eugene Mayevski 'EldoS Corp
Well, I'm starting to think that both options (NTFS vs SQLITE) are inefficient and are not suited for the requirement. What do you say of the idea: using 100MB per file, while storing offsets for each 'sub file' at the db? this way each file will contain 200 'sub files' easily, reducing the overhead of the NTFS features to minimum, at the expense of wasting storage which is not useable prior to a 'optimize' operation to fill the gaps?
Doori Bar
Well, if you are ready to make something custom, then you can use raw partition (no filesystem at all) and store your blocks on it. But you will end up re-inventing some kind of file system. This is why looking at wide choice of existing filesystems and choosing something for your needs seems to be better idea.
Eugene Mayevski 'EldoS Corp
From my POV, NTFS answers all my needs, but the additional costy features. Do you happen to know of such basic filesystem which answers the requirement, with no additional unremoval features?
Doori Bar
FAT32 might work, but it's limited in MAX size (and the limit is less than 1TB), so I guess you need to check the comparison table on Wikipedia (I posted the link in my answer).
Eugene Mayevski 'EldoS Corp
Thanks once again
Doori Bar