tags:

views:

68

answers:

3

Hi,

I'm working on a regular expression that will only return true when a date string is in a format something like 'ddd, dd mmm yy'.

Valid matches would be values like "Sun, 20 Jun 10" or "Mon, 21 Jun 10" but not "Sunday, 20 Jun 10" or "20 Jun 10".

This will be used with mb_ereg in PHP.

My attempts so far have only got me half way there. Any help appreciated!

Thanks, Dave

+3  A: 
"/[a-z]{3}, \\d{2} [a-z]{3} \\d{2}/i"

If i flag (case insensitive) is not supported, replace [a-z] with [a-zA-Z]

Also, replace [a-z]{3} with (Sun|Mon|Tue|Wed|Thu|Fri|Sat) and corresponding (Month|List) for a stricter validation.

Amarghosh
I'd think that \\d{2} is to general.
faileN
@faileN If you want to go about making it perfectly foolproof (including month names), you'll end up with a looong and hard to maintain regex; I'd rather check the syntax with regex and validate values with code
Amarghosh
@faileN After all, any regex that you can come up with will pass `Sat, 18 Jun 10` as valid, but it's a Friday.
Amarghosh
Yepp, that's right. First I had built in the alternatives like you suggested. But that was to much and I thought of validating the date afterwards with "checkdate()" for example. So in the end it doesn't matter, if you use "\\d[2} or my version. But it's just as I said. Personally it's to general for me. But this is no criticism on your regex, which also should work :)
faileN
@faileN Agree that `\d` is too general even for this case - it ain't hard to check for valid numbers; I was lazy I guess. And btw, feel free to criticize me - I can handle it well and learn from that :)
Amarghosh
More people should have such an attitude like you. It's all about learning new stuff =)
faileN
Sorry Amarghosh I couldn't get this one to work. I tried with with the [a-zA-Z] as well, but neither attempts returned true when used with mb_ereg.
Trindaz
@Trindaz Try replacing `\\d` with `[0-9]`
Amarghosh
A: 

I'm not sure why you'd prefer ereg over preg, but here:

$regex = '([A-Z][a-z]{2}), (([012][0-9])|(3[01])|[0-9]) ([A-Z][a-z]{2}) ([0-9]{2})';
mb_ereg($regex, "Sun, 31 Jun 10", $regs);

Of course, if you want to convert to a POSIX time, you'd need to map the month part to an integer and use mktime.

p00ya
`([012][0-9])` would allow `00` for a date.
Amarghosh
It would also `Xxx 00 Yyy 00`
Gordon
+1  A: 

This is one solution. Days like "00" are not allowed.

$date = 'Fri, 18 Jun 10';
$regex = '#([A-Za-z]{3}), ((?:0[1-9])|(?:(?:1|2)[0-9])|(?:3(?:0|1))) ([A-Za-z]{3}) ([0-9]{2})#';
preg_match($regex, $date, $matches);

// Create Vars out of the matching...
$day_abbr = $matches[1];
$day = $matches[2];
$month_abbr = $matches[3];
$year = $matches[4];

If you want to allow days without leading zeros you will have to use this regex (just added an questionmark. This means dates like "Sun, 8 Jun 10" are also valid with the following regex.

$regex = '#([A-Za-z]{3}), ((?:0?[1-9])|(?:(?:1|2)[0-9])|(?:3(?:0|1))) ([A-Za-z]{3}) ([0-9]{2})#';
faileN
Just my 2 cents: Avoiding `?:` from regexes would make them easier to read and understand; users can always add them if necessary. I use them only when question has something to do with captured groups being used for further reading or replacing. Also `(1|2)[0-9]`, `3(0|1)` etc can be written as `[12][0-9]` and `3[01]` - don't know if it makes any difference in performance-wise though.
Amarghosh
Yep, of course you can cut the `?:` . I only integrated them to make it easier for the variables afterwards :) Otherwise it would have been complicated to use something like: `$day = $matches[2], $year = $matches[9]` ...There really is an performance difference between alternation and ranges in the regexes. Ranges are always better for a bunch of alternatives. But using the alternation-operator `|` is okay for only 2-5 alternatives. So your approach with `[12][0-9]` also works perfectly. But as we've already discussed: There is no "perfect regex". Everyone got its own taste ;)
faileN
This one worked beautifully. Thanks faileN!
Trindaz