tags:

views:

124

answers:

7

I made a regex for port numbers (before you say this is stupid, its going into a bigger regex for URL's which is much harder than it sounds).

My coworker said this is really bad and isnt going to catch everything. I told him no way! This thing catches everything from 0 to 65535 and NOTHING ELSE. Someone confirm this for me or my coworker? BTW: Never search for regex's online. I found so many that would get me 'every port from 0 to 65536' PORT 65536!!! i'ma go run my server on 65536 guys just connect later

BAD FORMAT - Made for computers (one line) :

/(^[0-9]$)|(^[0-9][0-9]$)|(^[0-9][0-9][0-9]$)|(^[0-9][0-9][0-9][0-9]$)|((^[0-5][0-9][0-9][0-9][0-9]$)|(^6[0-4][0-9][0-9][0-9]$)|(^65[0-4][0-9][0-9]$)|(^655[0-2][0-9]$)|(^6553[0-5]$))/

HUMAN READABLE:

/(^[0-9]$)|                           #single digit
 (^[0-9][0-9]$)|                      #two digit
 (^[0-9][0-9][0-9]$)|                 #three digit
 (^[0-9][0-9][0-9][0-9]$)|            #four digit
 ((^[0-5][0-9][0-9][0-9][0-9]$)|      #five digit (up to 59999)
  (^6[0-4][0-9][0-9][0-9]$)|          #           (up to 64999)
  (^65[0-4][0-9][0-9]$)|              #           (up to 65499)
  (^655[0-2][0-9]$)|                  #           (up to 65529)
  (^6553[0-5]$))/                     #           (up to 65535)

who can deny my power?

+5  A: 

Well, it's easy to prove that it will validate any correct port: just generate each valid string and test that it passes. Making sure it doesn't allow anything that it shouldn't is harder though - obviously you can't test absolutely every invalid string. You should definitely test simple cases and anything which you think might pass incorrectly (or which would pass incorrectly with a lesser regex - "65536" being an example).

It will allow some slightly odd port specifications though - such as "0000". Do you want to allow leading zeroes?

You might also want to consider whether you actually need to specify ^ and $ separately for each case, or whether you could use ^(case 1)|(case 2)|...$. Oh, and quantifiers could simplify the "1 to 4 digits" case too: ([0-9]{1,4}) will find between 1 and 4 digits.

(You might want to work on sounding a little less arrogant, by the way. If you're working with other people, communicating in a less aggressive way is likely to do more to improve everyone's day than just proving your regex is correct...)

Jon Skeet
+1 but besides validating with each valid string, OP should also test for invalid strings, especially boundary values. While he's at it, keeping the tone down wouldn't hurt either.
Lieven
@Lieven: Oh absolutely, invalid strings should definitely be tested. That was part of my point - he needs to work out whether "0000" should *be* invalid. Will edit to make that clearer.
Jon Skeet
+4  A: 

A style note:

Repeating [0-9] over and over again is silly - something like [0-9][0-9][0-9] is much better written as \d{3}.

Amber
...although sometimes `[0-9]` is better than `\d`, for example in .NET where `d` will also match digits like `۳` (indo-arabic 3) which you probably don't want matched.
Tim Pietzcker
+5  A: 

You could shorten it considerably:

^0*(?:6553[0-5]|655[0-2][0-9]|65[0-4][0-9]{2}|6[0-4][0-9]{3}|[1-5][0-9]{4}|[1-9][0-9]{1,3}|[0-9])$
  • no need to repeat the anchors every single time
  • no need for lots of capturing groups
  • no need to spell out repetitions.

Drop the leading 0* if you don't want to allow leading zeroes.

This regex is also better because it matches the special cases (65535, 65001 etc.) first and thus avoids some backtracking.

Oh, and since you said you want to use this as part of a larger regex for URLs, you should then replace both ^ and $ with \b (word boundary anchors).

Tim Pietzcker
That's certainly more readable so, even though I doubt the sanity of the question :-), +1 for a good answer.
paxdiablo
+4  A: 

Personally I would match just a number and then I would check with code that the number is in range.

gpeche
A: 

regex has many implement ,what the paltform. try below , remove blanks

^[1-5]?\d{1,4}|6([0-4]\d{3}|5([0-4]\d{2}|5([0-2]\d|3[0-5]))$

readable

^
[1-5]?\d{1,4}|
6(
 [0-4]\d{3}|
 5(
  [0-4]\d{2}|
  5(
   [0-2]\d|
   3[0-5]
  )
 )
$
guilin 桂林
Not bad, but you need another set of parentheses at the top level to make the anchors work right: `^(foo|bar)$`, not `^foo|bar$`
Alan Moore
test in python, `^foo|bar$` works
guilin 桂林
+1  A: 
/^(6553[0-5])|(655[0-2]\d)|(65[0-4]\d{2})|(6[0-4]\d{3})|([1-5]\d{4})|([1-9]\d{1,3})|(\d)$/
Andrew Cooper
+2  A: 

What's wrong with parsing it into a number and work with integer comparisons? (regardless of whether or not this will be part of a "larger" regex).

If I were to use regex, I would just use:

\d{1,5}

Nope, it doesn't check for "valid" port numbers (neither does yours). But it's much more legible and for practical purposes I'd say it's "good enough."

PS: I'd work on being more humble.

NullUserException