How should I serve ZIPped webpages?

views:

155

answers:

How should I serve ZIPped webpages?

Background:
Our software generates reports for customers in the usual suspect formats (HTML, PDF, etc.) and each report can contain charts and other graphics unique to that report. For PDFs everthing is held in one place - the PDF file itself. HTML is trickier as the report is basically the sum of more than 1 file. The files are available via HTTP through Tomcat.

Problem:
I really want to have a tidy environment and wrap the HTML reports into a single file. There's MTHML, Data URIs, several formats to consider. This excellent question posits that, given the lack of cross-broser support for these formats, ZIP is a neat solution. This is attractive to me as I can also offer the zip for download as a "HTML report you can email" option. (In the past users have complained about losing the graphics about when they set about emailling HTML reports)

The solution seems simple. A request comes in, I locate the appropriate zip, unpack it somewhere on the webserver, point the request at the new HTML file, and after a day or so tidy everything up again.

But something doesn't quite seem right about that. I've kind of got a gut feeling that it's not a good solution, that there's something intrisically wrong with it, or that maybe a better way exists that I can't see at the moment.

Can anyone suggest whether this is good or bad, and offer an alternative solution?

Edit for more background information!
The reports need to persist on the server. Our customers are users at sites, and the visibility of a single report could be as wide as everyone at the site. The creation process involves the user selecting the criteria for the report, and submitting it for creation to the server. Data is extracted from the database and a document built. A placeholder record goes into the database, and the documents themselves get stored on the fileserver somewhere. It's the 'documents on the fileserver' part that I'd like to be tidier - zipping also means less disk space used!. Once a report is created, it is available to anyone who can see it.

+1 A:

I would have thought the plan would be that the zip file ends up on the client rather than staying on the server.

Without knowing about your architecture, I would guess at an approach like this:

User requests report
Server displays report as HTML
User perhaps tweaks some parameters, repeats request
Server displays report as HTML (repeat until user is happy)
On each of the HTML reports, there's a "download as zip" link
User clicks on link
Server regenerates report, stores it in a zip file and serves it to the user
User saves zip file somewhere, emails it around etc - server isn't involved at all

This relies on being able to rerun the report to generate the zip file, of course. You could generate a zip file each time you generate some HTML, but that's wasteful if you don't need to do it, and requires clean-up etc.

Perhaps I've misunderstood you though... if this doesn't sound appropriate, could you update your question?

EDIT: Okay, having seen the update to your question, I'd be tempted to store the files for each report in a separate directory (e.g. using a GUID as the directory name). Many file systems support compression at the file system level, so "premature zipping" probably wouldn't save much disk space, and would make extracting individual files harder. Then if the user requests a zip, you just need to build the zip file at that point, probably just in memory, before serving it.

Jon Skeet 2009-03-02 07:21:00

@Jon: how many fingers do you have on one hand? This is the Nth time you beat me to it replying so fast (where N is rather a lot) :)

tehvan 2009-03-02 07:24:34

Reports aren't generated every time they're served - they need to persist on the filesystem of the server indefinitely and this needs to be as tidy and as space-saving as possible. I tweaked the question.

banjollity 2009-03-02 09:59:42

You dont need to physically create zip files on a file system. Theres nothing wrong with creating the zips in memory, stream it to the browser and let GC take care of releasing the memory taken by the temporary zip. This of course introduces problems as it could be potentially ineffecient to continnally recreate the zip each time a request is made. However judge these things according to your needs and so on.

mP 2009-03-02 07:37:53

+1 A:

Once a report is created, it is available to anyone who can see it.

that is quite telling - it means that the reports are sharable, and you also would like to "cache" reports so that it doesnt have to be regenerated.

one way to do this would be to work out a way to hash the parameters together, in such a way that different parameter combinations (that result in different a report) hash to different values. then, you can use those hash as a key into a large cache of reports stored in disk in zip (may be the name of the file is the hash?)

that way, every time someone requests a report, you hash the parameters, and check if that report was already generated, and serve that up, either as a zip download, or, you can unzip it, and serve up the html as per normal. If the report doesnt exist, generate it, and zip it, make sure to be able to identify it later on as being produced by these parameters (i.e., record the hash).

one thing to be careful of is that file system writes tends to be non-atomic, so if you are not careful, you will regenerate the report twice, which sucks, but luckily in your case, not too harmful. to avoid, you can use a single thread to do it (slower), or implement some kind of lock.

Chii 2009-03-02 10:10:00

All of that is done, except that the HTML reports are stored as their constituent parts, not as a zip. My question was whether doing the zip thing is a good idea or not. Sorry if I didn't quite articulate that bit properly! :)

banjollity 2009-03-02 11:06:27

ah - well, there isnt anything wrong with zipping it up i suppose. its a very individual question. But you mentioned that using less disk space is better - if doing the zipping has no adverse effect such as eating up cpu power (because you have plenty?) then i dont see anything wrong.

Chii 2009-03-04 11:46:15

ansaurus

tags:

views:

answers:

How should I serve ZIPped webpages?

related questions