tags:

views:

16

answers:

1

There is a website that my company uses that updates information about 3 specific things throughout the day. We use the information from 1 of them and what we are wanting to do is pull this information as it is added to their site and add it to a page of our own to view easier. Is this even possible? Can anyone point me in the direction of setting this up? It is all text that we want to pull.

+1  A: 

Pick a language (e.g. Perl). Find an HTTP library for it (e.g. LWP). Fetch the page and run it through an HTTP parser (e.g. HTML::TreeBuilder). Pull out the bits you want and shove them into a template (e.g. TT) then dump to a file. Stick the program in cron or Windows Scheduler.

David Dorward
I dont know perl, what other languages can I use?
shinjuo
Whatever you like. The principles are still the same.
David Dorward
What do you think the easiest way to do it is?
shinjuo
Perl, but you don't know the language, so it probably isn't the easier way for you. You might try paying someone to do it for you, that's usually pretty easy.
David Dorward
I would like to try to learn to do it myself. Is there anyway to do it using PHP or javascript
shinjuo
I also know C and C++
shinjuo
Yes, you can use any of those languages (although you couldn't use JS in a browser environment). Pick one. Then find an HTTP … I'm repeating my answer aren't I?
David Dorward
Okay One last question I think. Is a HTTP parser and a HTML parser the same thing?
shinjuo
No (why would I put them as seperate steps if they were?). HTTP is a means of fetching data over a network. HTML is a language for describing the structure and semantics of text. Some libraries handle both functions, but they aren't the same thing.
David Dorward
The reason I asked is because there was a website with an HTML parser.
shinjuo