views:

58

answers:

3

When accessing http://www.example.net, a CSV file is downloaded with the most current data regarding that site. I want to have my site, http://www.example.com, access http://www.example.net on an hour by hour basis in order to get updated information.

I want to then use the updated information stored in the CSV file to compare changes from data in previous CSV files. I obviously have no idea what the best plan of attack would be so any help would be appreciated. I am just looking for a general outline of how I should proceed, but the more information the better.

By the way, I'm using a LAMP bundle so PHP and mySQL solutions are preferred.

A: 

Depending on the OS you're using, you're looking at a scheduled task (Windows) or a cron job (*nix) to kick up a service/app that would pull the new CSV and compare it to an older copy.

Traveling Tech Guy
+1  A: 

I think the most easy way for you to handle this would be to have a cron job running every hour (or scheduled task if are on windows), downloading the CSV with curl or file_get_contents(manual). When you have downloaded the CSV you can import new data in your MySQL database.

The CSV should have some kind of timestamp on every row so you can easily separate new and old data.

Also handling XML would be better then plain CSV.

A better way to setup that would be you to create a webservice on http://www.example.com and update in real time from your http://www.example.net. But it requires you to have access to both websites.

RageZ
Thanks for the answer. Unfortunately, I only am able to receive CSV and not XML. Also, I don't have access to `http://www.example.net`, I only have access to `http://www.example.com`. And all of this running on a bluehost server, so a Linux box.
ServAce85
so you have better to just play with CSV and cron, also wake sure your hosting company allow you to schedule some cron job. Do you have ssh access to the server ?
RageZ
no, i don't have ssh access yet. my hosting provider does allow for custom cron jobs. having never used cron jobs before, I'm unsure of what to do. do you have any resources you recommend or better yet an example of how I should proceed?
ServAce85
seting the cron job is just editing somefile, I think your hosting company should have a friendly web form to do so. Don't worry that's not complicate at all. The point which worries me is the CSV format.
RageZ
A: 

You'll definitely want to go the route of a cron job. I'm not exactly sure what you want to do with the differences, however, if you just want an email, here is one potential (and simplified) option:

wget http://uri.com/file.txt && diff file.txt file_previous.txt | mail -s "Differences" [email protected] && mv file.txt file_previous.txt

Try this command by itself from your command line (I'm guessing you are using a *nix box) to see if you can get it working. From there, I would save this to a shell file in the directory where you want to save your CSV files.

cd /path/to/directory
vi process_csv.sh

And add the following:

#!/bin/bash

cd /path/to/directory
wget http://uri.com/file.txt
diff file.txt file_previous.txt | mail -s "Differences" [email protected]
mv file.txt file_previous.txt

Save and close the file. Make the new shell script executable:

chmod +x process_csv.sh

From there, start investigating the cronjob route. It could be as easy as checking to see if you can edit your crontab file:

crontab -e

With luck, you'll be able to enter your cronjob and save/close the file. It will look something like the following:

01 * * * * /path/to/directory/process_csv.sh

I hope you find this helpful.

Jason Leveille
Well, looks like you won't be able to do any of this w/out SSH access. Here is some more information on Cron for your edification: http://unixgeeks.org/security/newbie/unix/cron-1.html
Jason Leveille