views:

172

answers:

8

Hi,

I am building a website, one section of which will display public notices from a different website. (The notices are public, no copyright violation) What I need to do is to automatically update my site whenever there is a new notice on the target site. I am using Joomla as my CMS. Any ideas?

Update - Unfortunately no RSS feed :(

Thanks and Regards, Nand

+1  A: 

If the other site provides an RSS feed for their notices easiest would be to use an RSS plugin for Joomla to present them.

PEZ
A: 

Unfortunately no RSS feed :(

  • Nand
+1  A: 

As there isn't yet an RSS feed for your target -- you could write one, in PHP (which IIRC Joomla is written in, so I'll assume support). You simply need to connect to the remote website, and parse the HTML (regular expressions are your friend here) to generate the feed data; I'd be inclined to have this output as RSS, to then fire into your Joomla site.

Drawbacks for parsing the HTML include adding a whopping great dependency on their website layout -- this could be mitigated by "giving" them the php that generates RSS for them to host, as it would add value to their website, as well as transferring ownership of maintenance to them.

Rowland Shaw
+2  A: 

There are a few tools out there that will scrape a site and convert it into RSS (you'll have to do a little work to specify how to do that conversion for a new site however). For example see http://rssscraper.rubyforge.org/

frankodwyer
A: 

I agreed with frankodwyer and Rowland's answers but one thing to consider is bugging the site owner (if it's still actively developed, I assume so if there is new news) to add an RSS feed. It's not the hardest thing to do.

Ross
+1  A: 

You can turn a website into rss or xml using yahoo pipes and/or yahoo query language (yql)

Andrej
A: 

It should be noted that "public" does not mean copyright free, unless explicitly placed in the public domain. There are lots of things that are public while retaining their automatic copyright.

ceejayoz
A: 

If you can view the HTML code of the website that you are trying to extract the information out of and they have a logical naming system for their news article entities, you should be able to use the fopen command eg.

<?php
$handle = fopen("http://www.example.com/", "r");
?>

And then with the information that it extracts from the article code if the article code was laid out like the following:

<div class="post" id="post-16283">
    <div class="postheader">
        <h1 id="article-title">Test Article Code</h1>
    </div>
    <div class="postcontent">
        This is the article text
    </div>
    <div class="postfooter">
        Copyright Information
    </div>
</div>

You could then use the following php code to show all the titles of the articles:

if (preg_match_all("#<div class="postheader"(.*?)</div>#s", $handle, $matches, PREG_PATTERN_ORDER) > 0) {
    foreach ($matches[0] as $match) {
        echo $match;
    }
}

This is just a basic indicator of how to extract information off the web page. It can be developed so you can extract the information article by article off the web page and then even format it your own way.

Hope that helps

privateace