views:

2515

answers:

13

Hi,

I have a version number of the following form:

version.release.modification

where version, release and modification are either a set of digits or the '*' wildcard character. Additionally, any of these numbers (and any preceding .) may be missing.

So the following are valid and parse as:

1.23.456 = version 1, release 23, modification 456
1.23     = version 1, release 23, any modification
1.23.*   = version 1, release 23, any modification
1.*      = version 1, any release, any modification
1        = version 1, any release, any modification
*        = any version, any release, any modification

But these are not valid:

*.12
*123.1
12*
12.*.34

Can anyone provide me a not-too-complex regex to validate and retrieve the release, version and modification numbers?

Many thanks!


Thanks for all the responses! This is ace :)

Based on OneByOne's answer (which looked the simplest to me), I added some non-capturing groups (the '(?:' parts - thanks to VonC for introducing me to non-capturing groups!), so the groups that do capture only contain the digits or * character.

^(?:(\d+)\.)?(?:(\d+)\.)?(\*|\d+)$

Many thanks to everyone!

+5  A: 

This might work:

^(\*|\d+(\.\d+){0,2}(\.\*)?)$

At the top level, "*" is a special case of a valid version number. Otherwise, it starts with a number. Then there are zero, one, or two ".nn" sequences, followed by an optional ".*". This regex would accept 1.2.3.* which may or may not be permitted in your application.

The code for retrieving the matched sequences, especially the (\.\d+){0,2} part, will depend on your particular regex library.

Greg Hewgill
Great answer!I think you should swap the unescaped * for {0,2} to prevent 1.2.3.4 matching.Depending on your regexp library you may want to enclose the pattern in ^(<pattern>)$ if you can only do a search rather than a match.
Dave Webb
Good points, I've improved the answer.
Greg Hewgill
That still permits 1.2.3.*, which isn't valid.
Steve Jessop
Slight alteration to ^(\*|\d+(\.\d+){0,1}(?:(\.\*)?|(\.\d+)?))$ would invalidate 1.2.3.* too
Pieter
Pieter: I think I'm going to stop where I am for now. This is quickly getting into "now you have two problems" territory. :)
Greg Hewgill
+7  A: 

Use regex and now you have two problems. I would split the thing on dots ("."), then make sure that each part is either a wildcard or set of digits (regex is perfect now). If the thing is valid, you just return correct chunk of the split.

phjr
+5  A: 

I'd express the format as:

"1-3 dot-separated components, each numeric except that the last one may be *"

As a regexp, that's:

^(\d+\.)?(\d+\.)?(\*|\d+)$

[Edit to add: this solution is a concise way to validate, but it has been pointed out that extracting the values requires extra work. It's a matter of taste whether to deal with this by complicating the regexp, or by processing the matched groups.

In my solution, the groups capture the "." characters. This can be dealt with using non-capturing groups as in ajborley's answer.

Also, the rightmost group will capture the last component, even if there are fewer than three components, and so for example a two-component input results in the first and last groups capturing and the middle one undefined. I think this can be dealt with by non-greedy groups where supported.

Perl code to deal with both issues after the regexp could be something like this:

@version = ();
@groups = ($1, $2, $3);
foreach (@groups) {
    next if !defined;
    s/\.//;
    push @version, $_;
}
($major, $minor, $mod) = (@version, "*", "*");

Which isn't really any shorter than splitting on "." ]

Steve Jessop
Adding some non-capturing groups (see my answer below) means that the capturing groups don't capture the trailing '.' ^(?:(\d+)\.)?(?:(\d+)\.)?(\*|\d+)$Thanks!
ajborley
The only problem with that one - being a very nice and clean proposal - is that the groups are not right because 1.2 will capture 1 in the first and 2 in the third group because of greedyness.
jrudolph
Good point, thanks. I ignored extraction of the values since it depends on the library. In Perl for instance you could just look at $1, $2, $3 in turn and skip undefined values. But non-greedy groups are better. I'll edit to add.
Steve Jessop
A: 

Keep in mind regexp are greedy, so if you are just searching within the version number string and not within a bigger text, use ^ and $ to mark start and end of your string. The regexp from Greg seems to work fine (just gave it a quick try in my editor), but depending on your library/language the first part can still match the "*" within the wrong version numbers. Maybe I am missing something, as I haven't used Regexp for a year or so.

This should make sure you can only find correct version numbers:

^(\*|\d+(\.\d+)*(\.\*)?)$

edit: actually greg added them already and even improved his solution, I am too slow :)

FrankS
Also, I think you've hit the same problem I did, which is that backslash characters are treated specially by the site. I think your regexp needs more of them - look carefully at the preview, and keep adding them until it's correct :-)
Steve Jessop
ouch yeah, didn't notice that - thanks :)
FrankS
+1  A: 

I tend to agree with split suggestion.

Ive created a "tester" for your problem in perl

#!/usr/bin/perl -w


@strings = ( "1.2.3", "1.2.*", "1.*","*" );

%regexp = ( svrist => qr/(?:(\d+)\.(\d+)\.(\d+)|(\d+)\.(\d+)|(\d+))?(?:\.\*)?/,
            onebyone => qr/^(\d+\.)?(\d+\.)?(\*|\d+)$/,
            greg => qr/^(\*|\d+(\.\d+){0,2}(\.\*)?)$/,
            vonc => qr/^((?:\d+(?!\.\*)\.)+)(\d+)?(\.\*)?$|^(\d+)\.\*$|^(\*|\d+)$/,
            ajb => qr/^(?:(\d+)\.)?(?:(\d+)\.)?(\*|\d+)$/,
            jrudolph => qr/^(((\d+)\.)?(\d+)\.)?(\d+|\*)$/
          );

  foreach my $r (keys %regexp){
    my $reg = $regexp{$r};
    print "Using $r regexp\n";
foreach my $s (@strings){
  print "$s : ";

    if ($s =~m/$reg/){
    my ($main, $maj, $min,$rev,$ex1,$ex2,$ex3) = ("any","any","any","any","any","any","any");
    $main = $1 if ($1 && $1 ne "*") ;
    $maj = $2 if ($2 && $2 ne "*") ;
    $min = $3 if ($3 && $3 ne "*") ;
    $rev = $4 if ($4 && $4 ne "*") ;
    $ex1 = $5 if ($5 && $5 ne "*") ;
    $ex2 = $6 if ($6 && $6 ne "*") ;
    $ex3 = $7 if ($7 && $7 ne "*") ;
    print "$main $maj $min $rev $ex1 $ex2 $ex3\n";

  }else{
  print " nomatch\n";
  }
  }
print "------------------------\n";
}

Current output:

> perl regex.pl
Using onebyone regexp
1.2.3 : 1. 2. 3 any any any any
1.2.* : 1. 2. any any any any any
1.* : 1. any any any any any any
* : any any any any any any any
------------------------
Using svrist regexp
1.2.3 : 1 2 3 any any any any
1.2.* : any any any 1 2 any any
1.* : any any any any any 1 any
* : any any any any any any any
------------------------
Using vonc regexp
1.2.3 : 1.2. 3 any any any any any
1.2.* : 1. 2 .* any any any any
1.* : any any any 1 any any any
* : any any any any any any any
------------------------
Using ajb regexp
1.2.3 : 1 2 3 any any any any
1.2.* : 1 2 any any any any any
1.* : 1 any any any any any any
* : any any any any any any any
------------------------
Using jrudolph regexp
1.2.3 : 1.2. 1. 1 2 3 any any
1.2.* : 1.2. 1. 1 2 any any any
1.* : 1. any any 1 any any any
* : any any any any any any any
------------------------
Using greg regexp
1.2.3 : 1.2.3 .3 any any any any any
1.2.* : 1.2.* .2 .* any any any any
1.* : 1.* any .* any any any any
* : any any any any any any any
------------------------
svrist
That would be nice, since OneByOne's looks like the most straightforward one.
jrudolph
You should test the wrong ones, too. You missed to quote OneByOne's dots.
jrudolph
Updated with the dots, and more regexps
svrist
Thanks for the updates - very useful feedback!
ajborley
A: 
(?ms)^((?:\d+(?!\.\*)\.)+)(\d+)?(\.\*)?$|^(\d+)\.\*$|^(\*|\d+)$

Does exactly match your 6 first examples, and rejects the 4 others

  • group 1: major or major.minor or '*'
  • group 2 if exists: minor or *
  • group 3 if exists: *

You can remove '(?ms)'
I used it to indicate to this regexp to be applied on multi-lines through QuickRex

VonC
A: 

Don't know what platform you're on but in .NET there's the System.Version class that will parse "n.n.n.n" version numbers for you.

Duncan Smart
Only in .Net Framework 4 ... :\
SoMoS
No, it's been there since version 1.0
Duncan Smart
A: 

This matches 1.2.3.* too

^(*|\d+(.\d+){0,2}(.*)?)$

I would propose the less elegant:

(*|\d+(.\d+)?(.*)?)|\d+.\d+.\d+)

Victor
+2  A: 

Thanks for all the responses! This is ace :)

Based on OneByOne's answer (which looked the simplest to me), I added some non-capturing groups (the '(?:' parts - thanks to VonC for introducing me to non-capturing groups!), so the groups that do capture only contain the digits or * character.

^(?:(\d+)\.)?(?:(\d+)\.)?(\*|\d+)$

Many thanks everyone!

ajborley
Could you add this as an edit to your question instead? That way the right answers is close to the top
svrist
A: 

Another try:

^(((\d+)\.)?(\d+)\.)?(\d+|\*)$

This gives the three parts in groups 4,5,6 BUT: They are aligned to the right. So the first non-null one of 4,5 or 6 gives the version field.

  • 1.2.3 gives 1,2,3
  • 1.2.* gives 1,2,*
  • 1.2 gives null,1,2
  • *** gives null,null,*
  • 1.* gives null,1,*
jrudolph
A: 

It seems pretty hard to have a regex that does exactly what you want (i.e. accept only the cases that you need and reject all others and return some groups for the three components). I've give it a try and come up with this:

^(\*|(\d+(\.(\d+(\.(\d+|\*))?|\*))?))$

IMO (I've not tested extensively) this should work fine as a validator for the input, but the problem is that this regex doesn't offer a way of retrieving the components. For that you still have to do a split on period.

This solution is not all-in-one, but most times in programming it doesn't need to. Of course this depends on other restrictions that you might have in your code.

rslite
+1  A: 
^(?:(\d+)\.)?(?:(\d+)\.)?(\*|\d+)$

Perhaps a more concise one could be :

^(?:(\d+)\.){0,2}(\*|\d+)$

This can then be enhanced to 1.2.3.4.5.* or restricted exactly to X.Y.Z using * or {2} instead of {0,2}

ofaurax
A: 

This should work for what you stipulated. It hinges on the wild card position and is a nested regex:

^((\*)|([0-9]+(\.((\*)|([0-9]+(\.((\*)|([0-9]+)))?)))?))$

http://imgur.com/3E492.png