ansaurus

Question

Automatically copying new articles from a website.

Answer 1

+1 A:

If the other site provides an RSS feed for their notices easiest would be to use an RSS plugin for Joomla to present them.

PEZ 2008-12-31 10:37:19

Answer 2

A:

Unfortunately no RSS feed :(

Nand

2008-12-31 11:05:55

Answer 3

+1 A:

As there isn't yet an RSS feed for your target -- you could write one, in PHP (which IIRC Joomla is written in, so I'll assume support). You simply need to connect to the remote website, and parse the HTML (regular expressions are your friend here) to generate the feed data; I'd be inclined to have this output as RSS, to then fire into your Joomla site.

Drawbacks for parsing the HTML include adding a whopping great dependency on their website layout -- this could be mitigated by "giving" them the php that generates RSS for them to host, as it would add value to their website, as well as transferring ownership of maintenance to them.

Rowland Shaw 2008-12-31 11:13:56

Answer 4

+2 A:

There are a few tools out there that will scrape a site and convert it into RSS (you'll have to do a little work to specify how to do that conversion for a new site however). For example see http://rssscraper.rubyforge.org/

frankodwyer 2008-12-31 11:31:20

Answer 5

A:

I agreed with frankodwyer and Rowland's answers but one thing to consider is bugging the site owner (if it's still actively developed, I assume so if there is new news) to add an RSS feed. It's not the hardest thing to do.

Ross 2008-12-31 11:50:22

Answer 6

+1 A:

You can turn a website into rss or xml using yahoo pipes and/or yahoo query language (yql)

Andrej 2008-12-31 12:04:30

Answer 7

A:

It should be noted that "public" does not mean copyright free, unless explicitly placed in the public domain. There are lots of things that are public while retaining their automatic copyright.

ceejayoz 2008-12-31 13:07:02

Answer 8

A:

If you can view the HTML code of the website that you are trying to extract the information out of and they have a logical naming system for their news article entities, you should be able to use the fopen command eg.

<?php
$handle = fopen("http://www.example.com/", "r");
?>

And then with the information that it extracts from the article code if the article code was laid out like the following:

<div class="post" id="post-16283">
    <div class="postheader">
        <h1 id="article-title">Test Article Code</h1>
    </div>
    <div class="postcontent">
        This is the article text
    </div>
    <div class="postfooter">
        Copyright Information
    </div>
</div>

You could then use the following php code to show all the titles of the articles:

if (preg_match_all("#<div class="postheader"(.*?)</div>#s", $handle, $matches, PREG_PATTERN_ORDER) > 0) {
    foreach ($matches[0] as $match) {
        echo $match;
    }
}

This is just a basic indicator of how to extract information off the web page. It can be developed so you can extract the information article by article off the web page and then even format it your own way.

Hope that helps

privateace 2009-02-17 23:32:11

ansaurus

tags:

views:

answers:

Automatically copying new articles from a website.

related questions