views:

47

answers:

1

Hi guys & ladies,

I have a feature which allows the user to create projects and view it on this page. They could import resources(pdf,img,etc) to be kept along with their projects. So now i want to create a feature which allows the user to export all their stuff and those people who are in the same group as them all neatly with pretty ribbon tied in a zip file.

Currently i'm using Archive:Zip to zip up the file preemptively, keep their CRC32 checksum and running this as a daily cronjob to cut the user waiting time down. But if there's any changes to any of the files i will have to rerun the whole thing.

My initial benchmark shows me that 103MB of file will takes up to 47secs to run. The process involve generating XML linking them to XSL, copying images, html for the iframes and what not.

I'm thinking of creating a table or a text file to keep CRC32 checksum or last modified date for all of the files in a temporarily storage area and compare with this list each time the user click on export, and if there's any new files, i will remove the same file from the cached zip file and add in the new file. Or i will just keep all the loose files and copy and replace the newer files and then do the archive on each click.

My questions are:

  1. Is this considered as a premature or bad optimization technique?
  2. How should i properly optimize this?
  3. Is there some book or resources that I can learn for these sort of optimization techniques?
+2  A: 

What's wrong with the idea of:

  • setting a flag of some sort whenever a users files change (add, delete or change file).
  • running your nightly compress on each user whose files have changed, then resetting that flag.
  • if the user requests an export when the flag is set, you'll have to do the compress again before export completes (there's no way around that).

To further speed up the user experience, you could also decouple the export request from the export operation. For example, when a user (whose flag is set) requests an export, notify them that it will be done when the compress happens, and set a different flag. Then modify the second step above to also export the newly created package if this second flag is set.

This gives the user immediate feedback that something will happen but moves the grunt work to the future.

Alternatively, you don't have to tie the export to the compress. You could compress every night but allow extra compress/export jobs during the day as needed. It's still good to decouple the request from the event however.

Answering your specific questions.

1/ I do not consider this premature or bad optimization. The 'code' is functionally complete since it does all you ask of it so it is the right time for optimizing. In addition, you have identified the bottleneck and are optimizing the right area.

2/ See my text above. You should optimize it by doing exactly what you've done - identify the bottleneck and concentrate on improving that. Given that you're unlikely to get much better compression performance, the decoupling 'trick' I've suggested is a good one. Like progress bars and splash screens, it's usually more to do with the users perception of speed rather than speed itself.

3/ Books? Don't bother, there's thousands of resources on the net. Keep asking on SO and print out all the responses. Eventually your brain will be as full as mine and every new snippet of code will cause you to temporarily forget your wife's name :-).

paxdiablo
hehe, funny and useful +1, ok as i was hunting for more ways on how to approach this. It would be fun and interesting learning experience to be able to find more "Patterns" on issues like this though. thanks
melaos