How can I save content from another website to my database ?

views:

answers:

How can I save content from another website to my database ?

Hello,

I want to upload dynamically content from a soccer live score website to my database.

I also want to do this daily, from a single page on that website (the soccer matches for that day).

If you can help me only with the connection and retrieval of data from that webpage, I will manage the rest.

website: http://soccerstand.com/ language: php/java - mysql

Thank you !

+1 A:

You can use php's file function to get the data. You just pass it a URL and it returns the content as an array of lines from the file. You can also use file_get_contents to get the content as one big string.

echo 2010-06-01 23:24:50

Neither of those handle redirects, network latency or errors. You should use curl when fetching remote content.

Brent Baisley 2010-06-02 02:05:34

Ethical questions about scraping other site's data aside:

With php you can do an "open" call on a website as long as you're setup corectly. See this page for more details on that and examples: http://www.php.net/manual/en/wrappers.http.php

From there you have the content of the web page and it's a matter of breaking it up. Off the top of my head, I'd use regular expressions or an HTML parser to break apart the HTML, and then loop through the child elements and parse the data into your database calls to save the data.

There are a lot of resources for parsing HTML on the web and it's simply a matter of choosing the one that will work best for you.

Keep in mind you'll need to monitor the site for changes, because if they change elements, or their classes/ids you might need to change your parsing structure as well.

Mike 2010-06-01 23:25:26

At least in the US scores are considered facts which are immune to copyright claims. There's nothing stopping him from copying the scores. Of course IANYL applies.

Daisetsu 2010-06-01 23:29:57

Agreed, I see the site also has odds on it, and that's more where I was seeing possible issues in the future. Not the schedules.

Mike 2010-06-01 23:35:45

+1 A:

Using curl you will get the content of the page, then using regex you will get what you want.

There is an easy way: http://www.jonasjohn.de/lab/htmlsql.htm

ilhan 2010-06-02 00:00:46

Using DOM parsing and/or simplexml would probably be easier than regex to parse the fetched content.

Brent Baisley 2010-06-02 02:07:51

ansaurus

tags:

views:

answers:

How can I save content from another website to my database ?

related questions