views:

27

answers:

2

I'm looking for suggestions as to the best way to parse the following calendar... http://www.ucd.ie/events/calendar . I can't detect any well known framework being used nor can I find it in RSS/XML/JSON format.

The only possible way to parse the following that I see is to parse the raw HTML which is far from ideal especially since many of the tags are repeditive.. a typical event looks like so..

    <tr> 
            <td class="odd"> 
                <a href="http://www.ucd.ie/events/calendar?dt=d.en.66031&amp;amp;f=week&amp;amp;d=19/10/2010&amp;amp;sd=Wednesday, 06 October 2010 - Wednesday, 01 December 2010&amp;c=null&amp;c=null&amp;c=null&amp;c=null&amp;c=null&amp;c=null&amp;c=null&amp;c=null&amp;c=null&amp;c=null&amp;c=null&amp;c=null&amp;c=null">Exchange Information Talk</a> 
                <p class="description">Information for students on spending a period of study abroad on exchange as part of their UCD degree</p> 
            </td> 
            <td class="odd">UCD International</td> 
            <td class="odd">A105 Newman Building</td> 
        </tr>  

As you can see parsing many of these from a HTML page isn't going to be fun. Basically I'm wondering does anyone have any suggestions as to how I'd go about this? or perhaps a smarter way of doing things? I'd really appreciate any help as I'm stuck can't really find any alternatives.

Thanks.

+1  A: 

If the site does not provide another service then this HTML, your stuck with parsing it, but XPATH queries can make your live a lot more pleasant then just plain string matching.

Wrikken
A: 

You can try it with xpath , to get the link you'll do

//td[@class='odd']/a/@href

but it's gone break every time they change the hmtl ouput

remi bourgarel
is there no way of building something that'll automatically work for each case? the output will change daily.
Aidanc
No, that's why web services are made for, ask your university. And by "html ouput" I mean only the shape of the html tag, not the content, so I seriously doubt they will change it often (but they can).
remi bourgarel