views:

2301

answers:

3

I have several Google calendars that I'd like to merge and place on my windows desktop using Samurize. I've tried using Samurize's Page Scraper plugin, but it doesn't appear to be up to the task.

I can get Samurize to run a script and place it's output on the desktop, but I'm not sure what the best tools are to do this.

All the URLs I have are of the form:

http://www.google.com/calendar/feeds/example%40gmail.com/private-REMOVED/basic?futureevents=true&orderby=starttime&sortorder=ascending&singleevents=true

So I could fetch them using curl, but then I need to filter them.

I want something that looks like:

2009 12 02  Event from calendar 1's description 
2009 12 03  Event from calendar 2's description 
2009 12 04  Event from calendar 1's description 
2009 12 05  Event from calendar 3's description 
2009 12 06  Event from calendar 1's description

However the dates in the calendar feeds are formatted like this:

<title type='html'>Event from calendar 1's description</title><summary type='html'>When: Fri 5 Dec 2008&lt;br&gt;

So how do I filter out the dates and descriptions, and convert the dates?

(I have cygwin installed so something using perl or sed/awk would be perfect as I'm familiar enough with them that I'd be confident about altering them in future, but I'm open to suggestions.)

+1  A: 

I'm learning perl so please don't laugh too hard, but here's something that might get you most of the way towards parsing:

#!C:\Perl\bin -w
use strict;

my %months = ("Jan", "01", "Feb", "02", "Mar", "03", ... etc. etc. ... "Dec", "12");

$_ = "<title type='html'>Event from calendar 1's description</title><summary type='html'>When: Fri 5 Dec 2008<br>";

if (/<title type='html'>([\d\D]*)<\/title><summary type='html'>When: (\S+) (\S+) (\S+) (\S+)<br>/)
{
    print "$5 $months{$4} $3 $1\n";
}
John at CashCommons
that looks promising, I'll try it out and get back to you.
Sam Hasler
Just be sure to fill out the rest of the hash. ;)
John at CashCommons
Thanks, I was able to get something that worked exactly how I wanted it to. See my answer for the script I ended up using
Sam Hasler
+1  A: 

Building on John W's script this is what I'm using

#!c:\cygwin\bin\perl.exe -w
use strict;
use LWP::Simple qw(get);

my %calendars = ( "Sam Hasler", "http://www.google.com/calendar/feeds/blah/blah/basic"
                , "Family    ", "http://www.google.com/calendar/feeds/blah/blah/basic"
                , "Work      ", "http://www.google.com/calendar/feeds/blah/blah/basic"
                );

my $params = "?futureevents=true&orderby=starttime&sortorder=ascending&singleevents=true";

my %months = ( "Jan", "01", "Feb", "02", "Mar", "03", "Apr", "04"
             , "May", "05", "Jun", "06", "Jul", "07", "Aug", "08"
             , "Sep", "09", "Oct", "10", "Nov", "11", "Dec", "12");

my $calendar_name;
my $calendar_url;
my @lines;

while (($calendar_name, $calendar_url) = each(%calendars)){
    my $calendar_data = get "$calendar_url$params";
    @lines = split(/\n/, $calendar_data);

    foreach (@lines) {
     if (/<title type='html'>([\d\D]*)<\/title><summary type='html'>When: (\S+) (\S+) (\S+) (\S+)&lt;br&gt;/)
     {
      my $day = "$3";
      if ($3 < 10 ) {
       $day = "0$3";
      }

         print "$5 $months{$4} $day\t$calendar_name\t$1\n";
     }
    }

}

I just pipe the output through sort to get it in date order.

Update: I've converted my script to a plugin and submitted it to the Samurize website: Merge Google Calendar feeds.

Sam Hasler
Sam Hasler
Sam, glad I could help! You may be able to add something like my $description = $1; $description =~ s/ to take care of the encoded ampersands.
John at CashCommons
+1  A: 

Two ideas.

You could use Yahoo Pipes (see this article.)

Or, if you don't want to wait around for Yahoo to refresh it's data, here is a python script under development to merge ICAL files.

Evan
I'm using the xml feeds because with parameters you can get it to output recurring events more than once.
Sam Hasler