views:

39

answers:

3

The Scenario

I am building a web application where reports can be generated on the fly (based on information retrieved from an SQL database). These reports will contain charts, which can also be generated on the fly. Because these charts contain sensitive information, using a 3rd party chart API (ie: Google Charts) is out of the question.

The Problem

I am using PHP's GD extension to generate these charts. It is pretty slow. Caching is the way to go, but the problem is there is a huge number of possible charts; although I believe the majority of the charts requested will be ones that have been generated before.

Partial Solution

Charts are generated with data and other information (size, chart type, etc.). Because these can uniquely identify a chart, I give each chart a unique hash based on this information and save it. Now I can compute the hash for a newly requested chart and see if I already have it rendered.

The problem with this is the event of a collision. To get around that, I am thinking of saving the hash and a serialized form of the data in an SQL table. Then if I have a cache hit, I'll still compare the data itself.

I am over-engineering this? (It's a 160-bit hash - SHA1)
Is there a better way to handle this?

A: 

Most probably if your hashed data length is less than 160 bit, you're safe. Otherwise, like you say, collisions may occur and comparing data is necessary.

ghaxx
A: 

Take a look at ChartDirector we use it at work and it doesn't rely on the GD library, should be faster.

fire
A: 

I am using PHP's GD extension to generate these charts. It is pretty slow.

I suspect that its not GD which is the slow bit. The most likely candidate is the processing of collating the data (from a database?). In which case you may get significant benefits from optimizing the database schema / and/or using pre-consolidated data.

Although you might also consider caching the query output, but unless you're using the same data elsewhere it's probably simpler to cache the graph images.

The problem with this is the event of a collision.

Premature optimization - it's not going to happen. But if you really must, split the meta-data you are using to generate the graph and store it in a seperate file (again indexed via the same hash) - then compare it at runtime. If you manage to get a collision, we'll have a whip-round and buy you a drink.

I would recommend having a look at jpgraph - which is an excellent bit of software and has caching built-in.

C.

symcbean
I measured the runtime; getting and processing data from the database is orders of magnitude faster (ok, ~200x faster) than generating the image.
quantumSoup