ansaurus

Question

regular expression for matching date in Perl

Answer 1

+5 A:

How about something from Regexp::Common::time?

Philip Potter 2010-08-30 12:23:36

Answer 2

+3 A:

I would use the split function :

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my @dates = (
'2010',
'2010-08',
'2010-08-27',
'2010-08-27 02',
'2010-08-27 02:11',
'2010-08-27 02:11:36',
);

for (@dates) {
  my @list = split /[ :-]/;
  print Dumper(\@list);
}

output :

$VAR1 = [
          '2010'
        ];
$VAR1 = [
          '2010',
          '08'
        ];
$VAR1 = [
          '2010',
          '08',
          '27'
        ];
$VAR1 = [
          '2010',
          '08',
          '27',
          '02'
        ];
$VAR1 = [
          '2010',
          '08',
          '27',
          '02',
          '11'
        ];
$VAR1 = [
          '2010',
          '08',
          '27',
          '02',
          '11',
          '36'
        ];

M42 2010-08-30 12:26:30

I did not understand. What are you trying to do here?

Lazer 2010-08-30 12:48:37

I'm splitting all date format you've given into an array. This done, you can test the values contained in this array. `array[0]` contains the year, `array[1]` contains (eventually) the month and so on.

M42 2010-08-30 12:54:29

Answer 3

+1 A:

This matches all the above (but also other stuff - see the comment!) and may be slightly easier to read:

/(\d{4})(-\d{2})?(\w{1}\d{2})?(:\d{2})?/

Dave Everitt 2010-08-30 12:40:22

I wouldn't say this is cleaner. Nor that it does the job, actually: it accepts 1234q56, for instance. Also: {0,2}? is superfluous, you can't optionally match zero times.

mscha 2010-08-30 12:47:41

Accepted - I only tested against all the given patterns. Thanks for the heads-up about the {0,2}? - cross-brain-infection from something else I was doing. Corrected.

Dave Everitt 2010-08-30 13:07:18

The ? in {0,2}? is non-greedy, so it will prefer matching fewer times, presumably not what was intended, but certainly not superfluous.

ysth 2010-08-30 13:53:05

Thanks ysth. Was using exactly that in some Ruby regex (hence the spillover into this question).

Dave Everitt 2010-08-30 15:08:48

Indeed, I was wrong about {0,2}?, apologies.

mscha 2010-08-30 16:41:18

Answer 4

+1 A:

If you want faster, then look away from regex, and look at XS modules: Date::Calc is a good one.

gms8994 2010-08-30 13:10:54

Answer 5

+2 A:

Your regex is just fine except for missing anchors (unless you want to match 2008 in "abc200890"?). Assuming you want to match the whole string:

/^\d{4}(?:-\d{2}(?:-\d{2}(?: \d{2}(?::\d{2}(?::\d{2})?)?)?)?)?\z/

(?:...) should be used if you don't actually want the captured substrings, which I'd guess to be the case.

ysth 2010-08-30 13:56:10

Answer 6

+7 A:

Based on the lack of a capturing group around the year, I assume you care only whether a date matches.

I tried a few different patterns related to the one from your question, and the one that gave a ten- to fifteen-percent improvement was disabling capturing, i.e.,

/\d{4}(?:-\d{2}(?:-\d{2}(?: \d{2}(?::\d{2}(?::\d{2})?)?)?)?)?/

The perlre documentation covers (?:...):

(?:pattern)

(?imsx-imsx:pattern)

This is for clustering, not capturing; it groups subexpressions like (), but doesn't make backreferences as () does. So
@fields = split(/\b(?:a|b|c)\b/)
is like
@fields = split(/\b(a|b|c)\b/)
but doesn't spit out extra fields. It's also cheaper not to capture characters if you don't need to.

Any letters between ? and : act as flags modifiers as with (?imsx-imsx). For example,
/(?s-i:more.*than).*million/i
is equivalent to the more verbose
/(?:(?s-i)more.*than).*million/i

Benchmark output:

             Rate      U   U/NC CH/NC/A CH/NC/A/U     CH  CH/NC   null
U         31811/s     --   -32%    -58%      -59%   -61%   -66%   -93%
U/NC      46849/s    47%     --    -38%      -39%   -42%   -50%   -90%
CH/NC/A   76119/s   139%    62%      --       -1%    -6%   -18%   -84%
CH/NC/A/U 76663/s   141%    64%      1%        --    -6%   -17%   -84%
CH        81147/s   155%    73%      7%        6%     --   -13%   -83%
CH/NC     92789/s   192%    98%     22%       21%    14%     --   -81%
null     481882/s  1415%   929%    533%      529%   494%   419%     --

Code:

#! /usr/bin/perl

use warnings;
use strict;

use Benchmark qw/ :all /;

sub option_chain {
  local($_) = @_;
  /\d{4}(-\d{2}(-\d{2}( \d{2}(:\d{2}(:\d{2})?)?)?)?)?/
}

sub option_chain_nocap {
  local($_) = @_;
  /\d{4}(?:-\d{2}(?:-\d{2}(?: \d{2}(?::\d{2}(?::\d{2})?)?)?)?)?/
}

sub option_chain_nocap_anchored {
  local($_) = @_;
  /\A\d{4}(?:-\d{2}(?:-\d{2}(?: \d{2}(?::\d{2}(?::\d{2})?)?)?)?)?\z/
}

sub option_chain_anchored_unrolled {
  local($_) = @_;
  /\A\d\d\d\d(-\d\d(-\d\d( \d\d(:\d\d(:\d\d)?)?)?)?)?\z/
}

sub simple_split {
  local($_) = @_;
  split /[ :-]/;
}

sub unrolled {
  local($_) = @_;
  grep defined($_), /\A (\d\d\d\d)-(\d\d)-(\d\d) (\d\d):(\d\d):(\d\d) \z
                    |\A (\d\d\d\d)-(\d\d)-(\d\d) (\d\d):(\d\d)        \z
                    |\A (\d\d\d\d)-(\d\d)-(\d\d) (\d\d)               \z
                    |\A (\d\d\d\d)-(\d\d)-(\d\d)                      \z
                    |\A (\d\d\d\d)-(\d\d)                             \z
                    |\A (\d\d\d\d)                                    \z
                    /x;
}

sub unrolled_nocap {
  local($_) = @_;
  grep defined($_), /\A \d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d \z
                    |\A \d\d\d\d-\d\d-\d\d \d\d:\d\d      \z
                    |\A \d\d\d\d-\d\d-\d\d \d\d           \z
                    |\A \d\d\d\d-\d\d-\d\d                \z
                    |\A \d\d\d\d-\d\d                     \z
                    |\A \d\d\d\d                          \z
                    /x;
}

sub id { $_[0] }

my @examples = (
  "xyz",
  "2010",
  "2010-08",
  "2010-08-27",
  "2010-08-27 02",
  "2010-08-27 02:11",
  "2010-08-27 02:11:36",
);

cmpthese -1 => {
  "CH"        => sub {                   option_chain $_ for @examples },
  "CH/NC"     => sub {             option_chain_nocap $_ for @examples },
  "CH/NC/A"   => sub {    option_chain_nocap_anchored $_ for @examples },
  "CH/NC/A/U" => sub { option_chain_anchored_unrolled $_ for @examples },
  "U"         => sub {                       unrolled $_ for @examples },
  "U/NC"      => sub {                 unrolled_nocap $_ for @examples },
  "null"      => sub {                             id $_ for @examples },
};

Greg Bacon 2010-08-30 16:22:23

thanks a lot for the effort, @gbacon.

Lazer 2010-08-30 16:48:50

+20 (if I could) for producing a benchmark vs armchair speculation! +1 anyways...

drewk 2010-08-30 18:27:14

ansaurus

tags:

views:

answers:

regular expression for matching date in Perl

(?:pattern)

(?imsx-imsx:pattern)

related questions