views:

55

answers:

4

I've been whacking on this regex for a while, trying to build something that can pick out multiple ordered property values (DTSTART, DTEND, SUMMARY) from an .ics file. I have other options (like reading one line at a time and scanning), but wanted to build a single regex that can handle the whole thing.

SAMPLE PERL

# There has got to be a better way...
my $x1 = '(?:^DTSTART[^\:]*:(?<dts>.*?)$)';
my $x2 = '(?:^DTEND[^\:]*:(?<dte>.*?)$)';
my $x3 = '(?:^SUMMARY[^\:]*:(?<dtn>.*?)$)';
my $fmt = "$x1.*$x2.*$x3|$x1.*$x3.*$x2|$x2.*$x1.*$x3|$x2.*$x3.*$x1|$x3.*$x1.*$x2|$x3.*$x2.*$x1";

if ($evts[1] =~ /$fmt/smo) {
printf "lines:\n==>\n%s\n==>\n%s\n==>\n%s\n", $+{dts}, $+{dte}, $+{dtn};
} else {
print "Failed.\n";
}

SAMPLE DATA

BEGIN:VEVENT
UID:0A5ECBC3-CAFB-4CCE-91E3-247DF6C6652A
TRANSP:OPAQUE
SUMMARY:Gandalf_flinger1
DTEND:20071127T170005
DTSTART,lang=en_us:20071127T103000
DTSTAMP:20100325T003424Z
X-APPLE-EWS-BUSYSTATUS:BUSY
SEQUENCE:0
END:VEVENT

SAMPLE OUTPUT

lines:
==>
20071127T103000
==>
20071127T170005
==>
Gandalf_flinger1

A: 

It's better to use three regexes and some extra logic. This problem isn't a good match for regexes.

Kinopiko
A: 

That's ugly... I think that the "better way" is to match each property, once at a time.

leonbloy
+1  A: 

Instead of permuting the three regexes into one big pattern with ORs, why not test the three patterns separately, since (given the anchoring $s, ) they cannot overlap?

my $x1 = qr/(?:^DTSTART[^:]*:(?<dts>.*?)$)/smo;
my $x2 = qr/(?:^DTEND[^:]*:(?<dte>.*?)$)/smo;
my $x3 = qr/(?:^SUMMARY[^:]*:(?<dtn>.*?)$)/smo;

if ($evts[1] =~ $x1 and $evts[1] =~ $x2 and $evts[1] =~ $x3)
{
    # ...
}

(I also turned the x variables into patterns themselves, and removed the unneeded escape in the character classes.)

Ether
Thank you. This moved me along. Basically, I grabbed each instance of the target properties, sorted them, striped them and stored them (based upon sorted order). my ($dte, $dts, $dtn) = map { local $_ = $_; s/[^:]*://; $_ } sort $evts[1] =~ /^((?:DTSTART|DTEND|SUMMARY)[^:]*:\S+)/smg;
Andrew Philips
Whoops, one correction to that (spaces in event summary are a problem): my ($dte, $dts, $dtv) = map { local $_ = $_; s/[^:]*://; $_ } sort $evt =~ /^((?:DTSTART|DTEND|SUMMARY)[^:]*:(?:.*?))(?:\s*)$/smg;
Andrew Philips
+2  A: 

CPAN is your friend:

vFile

iCal parser

You will pull your hair out until bald without a parser on vFile format (other than trivial files.) Regex for this is very hard.

drewk
Thank you. If I start having weirdness with my simple parsing (I'm trying to remove duplicate .ics entries and the existing, recommended programs aren't cooperating).
Andrew Philips