views:

330

answers:

4

So I need to get hours, minutes and seconds out of entries like these:

  • 04:43:12
  • 9.43.12
  • 1:00
  • 01.04
  • 59
  • 09

The first two is hours, minutes and seconds. Next to is minutes and seconds. Last two is just seconds.

And I came up with this regexp, that works..:

\A(?<hours>\d{1,2})(?::|\.)(?<minutes>\d{1,2})(?::|\.)(?<seconds>\d{1,2})\z|\A(?<minutes>\d{1,2})(?::|\.)(?<seconds>\d{1,2})\z|\A(?<seconds>\d{1,2})\z

But it is ugly, and I want to refactor it down to not be 3 different expressions (mostly just to learn). I tried this:

\A(?:(?<hours>\d{1,2})(?::|\.){0,1})(?:(?<minutes>\d{1,2})(?::|\.){0,1})(?:(?<seconds>\d{1,2}){0,1})\z

But that does not work - minutes and seconds sometimes get screwed up. My brain is hurting, and I can't figure out, what I am doing wrong.

+2  A: 

I suggest the following expression.

^(((?<Hour>[0-9]{1,2})[.:])?(?<Minute>[0-9]{1,2})[.:])?(?<Second>[0-9]{2})$

This will allow single digit hours combined with single digit minutes like 3:7:21. If this is not desired, a slight modification is required.

^(((?<Hour>[0-9]{1,2})[.:](?=[0-9]{2}))?(?<Minute>[0-9]{1,2})[.:])?(?<Second>[0-9]{2})$

The positive lookahead assertion (?=[0-9]{2}) in the second expression solves this issue.

Daniel Brückner
+1  A: 

there is no real good way for this, as it really depends on your particular situation what to do when not all three parts are specified. For example, in many cases, I'd maybe prefer to interpret 3:30 as 3 hours and 30 minutes instead of 3 minutes and 30 seconds. It can't hurt being explicit about that, and making it easy to derive from the regex what these kinds of inputs mean.

Therefore I personally believe that the first regex is not that ugly at all - it might be less "magic", but it is much more readable and maintainable. Make sure you and others can still read and change the code later!

If your language supports it, I would use extended regexes (with support for whitespace and comments) and split it over three lines (or 6 or 9 if you put a comment on a separate line). That won't change the regex, but it will make it feel less ugly for sure.

skrebbel
Valid points. The reason for me wanting to make it better is primarily to learn.
Kjensen
+4  A: 

I haven't tested this yet, but it should work:

^(?:(?:(?<hours>\d\d?)[:\.])?(?<minutes>\d\d?)[:\.])?(?<seconds>\d\d?)$

Edit:
Now I have tested it and verified that it works. :)

Guffa
This works but will capture 3:7:21 what might or not be exspected to be 3:07:21. And by the way, there is no need to escape the dot in character groups. (Or am I wrong? Is there a regex implementation requiring this?)
Daniel Brückner
The backslash on dot in a character class is unnecessary. Allowing 3:7:21 for 3:07:21 is probably an example of 'be generous in what you accept'.
Jonathan Leffler
I tend to escape some characters that don't strictly need escaping. Even if the Regex class doesn't need it to understand it, I might. :)
Guffa
+4  A: 

My suggestion:

(?:(?:(?<hh>\d{1,2})[:.])?(?<mm>\d{1,2})[:.])?(?<ss>\d{1,2})

structured:

(?:
  (?:
    (?<hh>\d{1,2})      // hours
    [:.]                // delimiter
  )?                    // optional
  (?<mm>\d{1,2})        // minutes
  [:.]                  // delimiter
)?                      // optional
(?<ss>\d{1,2})          // seconds (required)

If you wish, you can wrap the regex in delimiters - like word boundaries \b or string anchors (^ and $).

EDIT: Thinking about it, you can restrict that further to capture times that make sense only. Use

[0-5]?\d

in place of

\d{1,2}

to capture values between 0 and 59 only, where appropriate (seconds and minutes).

Tomalak
Love the structured examples in regexp...
gnarf
Awesome! Works great - and love the formatting too. If only my editor would support that, it would be easier to work with.
Kjensen
Isn't the comment character for regular expressions in "ignore whitespace and allow comments" mode `#` instead of `//`?
Joey