views:

21

answers:

1

I often find myself designing simple little web projects that are serving up aggregate content or doing a 'mashup'. Typically this involves running a script to scrape/parse/manipulate some data periodically, then serving that as 'static' content.

I run the 'refresh' script as a cron job that generates HTML that is served up to the end-user. The content doesn't change that often so I can usually just run the cron job on an hourly basis.

Is there a better way to do this?

A: 

If you are happy with how it's working now, I wouldn't change anything. It is a kludge, but a functional one. But I'm guessing you're not completely happy (otherwise you wouldn't have asked) so a more substantial answer follows.

A basic upgrade would be to write a script that polls your mashup sources and generates the HTML on-the-fly. The mashup sources could be anything from remote web servers, to local files, to local databases - anything you can "connect to" in code. The basic steps would be:

  1. Retrieve the info from each source, programmatically.
  2. Parse it and transform it as necessary, discarding the bits you don't want and perhaps reformatting certain parts like date formats etc.
  3. Inject all the various bits of your transformed info into an HTML structure and output that to the client.

1 & 2 sound like basically what you're already doing. It's just #3 that is the missing link. You basically want to dynamically generate the output on-the-fly instead of pregenerating it and sending out static HTML.

Languages well-suited for this sort of thing include PHP, Perl, Ruby, Python, and others; take your pick.

Further optimizations - in the order you'd probably want to do them - include:

  1. Caching the source data. Instead of polling the sources with every page load, poll them the first time, save the response to a file or database, and check the timestamp of the response every subsequent page load to see whether it's still "fresh." If so, you can send out the local cached copy instead, typically a massive performance improvement.
  2. Asynchronously loading your source data so that the time it takes to load becomes that of the slowest source, rather than the sum of all sources.
  3. Sending out the HTML page right away and loading each source via separate AJAX calls, displaying each in their own div, for example.
alexantd