What is the best way to store downloaded files?

Sorry for the bad title.

I'm saving web pages. I currently use 1 XML file as an index. One element contains file created date (UTC), full URL (w. query string and what not). And the headers in a separate file with similar name but appended special extension.

However, going at 40k (incl. header) files, the XML is now 3.5 MB. Recently I was still reading, adding new entry, save this XML file. But now I keep it in memory and save it every once in a while.

When I request a page, the URL is looked up using XPath on the XML file, if there is an entry, the file path is returned.

The directory structure is .\www.host.com/randomFilename.randext

So I am looking for a better way.

Im thinking:

One XML file per. domain (incl. subdomains). But I feel this might be a hassle.
Using SVN. I just tested it, but I have no experience in large repositories. Executing svn add "path to file" for every download, and commit when I'm done.
Create a custom file system, where I then can include everything I want, for ex. POST-data.
Generating a filename from the URL and somehow flattening the querystring, but large querystrings might be rejected by the OS. And if I keep it with the headers, I still need to keep track of multiple files mapped to each different query string. Hassle. And I don't want it to execute too slow either.

Multiple program instances will perform read/write operations, on different computers.

If I follow the directory/file method, I could in theory add a layer between so it uses DotNetZip on the fly. But then again, the query string.

I'm just looking for direction or experience here.

What I also want is the ability to keep history of these files, so the local file is not overwritten, and then I can pick which version (by date) I want. Thats why I tried SVN.

ansaurus

tags:

views:

answers:

What is the best way to store downloaded files?

related questions