tags:

views:

30

answers:

3

Hi,

I have website which connects to 14-17 xml streams and downloads streams with every page load. That was fine for testing purposes and for traffic <100/day.

However it now gets to stage when page loads slower and slower because I need to wait for external XML to download.

What is the best way to cache/manipulate with these files? These data are being constantly updated, so rather than downloading XML streams with every visit I would like to have some robot doing this every 5 minutes locally and visitors to read only from local copies.

What methods would you use to make sure that XML files wont get locked or there are no zero results whilst downloading new results?

Thanks.

A: 

"Cron" would be your best choice for that because that's how you can setup your server to run a scheduled task.

Now the actual setup depends on your webhost / which OS it runs / what level of access you have. For example, if I recall correctly, Dreamhost has cron administration in its admin tools while other webhosts would require to go thru telnet to setup.

Or if you're running a CMS like Drupal, there may be already tools to administer cron as well.


Cron is available on Unix, Linux and Mac servers; Windows servers use "Scheduled Task", but it's similar.

wildpeaks
+1  A: 

I'd probably store the data in memcached and update the data there after each download. You'll always have data available even if the file isn't on disk, and you'll have orders of magnitude less disk i/o than having them all read directly from the file, which will improve performance significantly.

For C# I think this is the library you'll need: http://sourceforge.net/projects/memcacheddotnet/

bemace
will the data be available if source server is offline?
fnovak
You can configure memcached to keep data as long as you like, so it certainly can be.
bemace
+1  A: 

One option would be to use memcached. You could cache the xml data there and set an expiry for say 5 minutes and have your pages load data from the memcached.

Another option (easier to setup or if you don't have the RAM to spare) would be to store downloaded data in a relational database instead of on page load. Then have your pages load the data from there.

trinth
+1: memcached is the way to go.
rsenna
so basically I dont need to change code at all? Do I need to change address of provider only? or do I need some memcached C# library/client? thanks
fnovak
will the data be available if source server is offline?
fnovak
you can have memcached keep the data for a certain period of time, but it doesn't guarantee that it will be stored. This is okay though, as you will probably want to write your code so that first you check if the data is in the cache. If not, download it, place it in the cache and serve the data. See the memcached pattern: http://code.google.com/appengine/docs/python/memcache/usingmemcache.html
trinth
wasn't aware that you were writing in C#, but have a look at http://stackoverflow.com/questions/351635/memcached-with-windows-and-net.
trinth