views:

18

answers:

2

The assumptions that I have are -

  • Representations have a definition which have its last modified date. E.g., Script that generates a JSON representation of a resource has when the script has been last modified.
  • Resource's persistent storage does not have any storage limitation

Now the situation is that I have Resources for which their representations could be either pre-generated or generated on the fly; where pre-generated refers to text/html or application/atom+xml being generated when the resource is modified (could be done asynchronously) and generated on the fly refers to, e.g., a JSP/PHP script generating representation when requested.

What I am confused about is how much performance increment would pre-generation be versus generated-on-fly + caching? What are your experience/opinion?

A: 

The performance gain is proportionate to the effort required to generate; so if it takes a lot of resources (and presumably time) to build a representation then caching is a good idea.

So the first thing you need to do is measure the process that builds the representations and identify where the performance hits occur.

Adrian K
+1  A: 

You ask about performance, but don't indicate what should perform, so I'm assuming that the metric is "response time".

The two approaches are just variants of each other. Pre generation is merely a different way of caching. So practically, the only difference is that one is "lazy" whereas the other isn't.

So the difference in latency would be zero when all resources have been cached (and don't change). But the performance difference varies according to several parameters:

  • The time it takes to generate an item
  • The number of times each item changes
  • How often each item is accessed

A tipping point is if an item is modified less often that it is accessed.

But there are a lot of other factors to consider:

  • A pre-generation scheme scales a lot better, since they don't require additional CPU if the number of requests increases
  • A pre-generation scheme is more fault tolerant, since there's no database involved in the critical path of your application
  • A pre-generation scheme can be hard to do if one change in resource X (e.g. it's deleted) causes 1000s of other resources to change (e.g. if they all link to X). It would increase the likelihood of a resource being modified more often than accessed.
mogsie
How to handle delete is a dilemma, but if we think how we hyperlink, the target resource does not know that its being hyperlinked but we do. So if target is deleted I would say its the responsibility of us to update our resource. Auto-removing a relation, such as SQL cascade on delete, is for convenience only. In my case I hope that read will be few 100 times the write at least if not more! One concern of pre-generation is, what if the representation generator changed after the representation has generated? Do we generate then only? or do we check for change when representation is requested?
imyousuf