tags:

views:

112

answers:

3

Hi,

I have this code written up to snatch some weather data and make it available on my website:

if( ! $xml = simplexml_load_file('http://www.weather.gov/data/current_obs/KBED.xml') ) 
{ 
    echo 'unable to load XML file'; 
} 
else 
{ 
  $temp = $xml->temp_f.' Degrees';
  $wind = $xml->wind_mph;
  $wind_dir = $xml->wind_dir;
  $gust = $xml->wind_gust_mph;
  $time = $xml->observation_time;
  $pres = $xml->pressure_in;
  $weath = $xml->weather;
}

And then I just echo them out inside the tags I want them inside. My site is low traffic, but I'm wondering what the "best" way is to do something like this if I were to spike way up in traffic. Should I write those variables I want into a database every hour (when the XML is refreshed) with a cron job to save pinging the server each time, or is that not bad practice? I understand this is a bit subjective, but I have no one else to ask for "best ways". Thanks!!

+1  A: 
  • Set up a cron job to periodically fetch the XML document, parse it and store the variables in a database.
  • When a page is requested, fetch the variables from the database and render your page.
  • It is a good idea to store the timestamp of the last update in the database as well, so that you can tell when the data is stale (because the weather website is down or so).

This setup looks very reasonable to me.

Ayman Hourieh
And am I right in assuming pinging the other server each time would be "rude" or otherwise inefficient?
Alex Mcp
t depends on their TOS. But since they provide a parsable XML feed, it is very likely that they are OK with this, especially if you do it infrequently. Once every hour should be reasonable. Again, it is a good idea to check the website's TOS first.
Ayman Hourieh
But certainly pinging it every page load as I have it now? It seems like much, but never having managed a server with high load, I don't know what's a nuisance and what's not...
Alex Mcp
Yeah, I think it is too much to hit the feed on every page view, especially that you can easily cache it.
Ayman Hourieh
A: 

You could cache the output of the external site, and let it renew itself say every 5-10 seconds. That would kill the impact of a lot of 'pings' from your site. It really depends on how important timing accuracy is to your customer/client.

In a high traffic situation I would have a separate script that runs a a daemon or cron job and fetches the weather every specified interval, and overwrites the public website page when done. That way, you've not to worry about caching as it's done by a background task, your visitors are merely accessing a static page from the web server. That also avoids or at least minimises the need to incorporate a database into the equation, and is fairly light-weight.

On the downside, it does create a second point of failure and could be pretty useless if the information needs to be accurate to the time of page access.

karim79
+2  A: 

I would suggest the following:

  • When you first get the content of the xml, parse it, and serialise it to a file, with a timestamp attached to the file in some way (perhaps as part of the serialised data structure)

  • Every time the page loads, grab that serialised data, and check the timestamp. If it's passed a certain point, go and grab the xml again and cache it, making sure to update the timestamp. If not, just use that data.

That should work, means you only have to go get the xml occasionally, and also, once the cache has expired, you don't have the waste of going and getting it regularly even though no-one is visiting (since it is only updated on a request).

Kazar
There is a downside to this approach; the page will be slower to load when the cache needs updating.
Ayman Hourieh
Granted, but then again, providing the response from the XML is sufficiently swift, not preventive - also, it does mean you don't need the cron job weak link, nor the database connection.
Kazar