tags:

views:

122

answers:

4

One of the HTML input fields in an app I'm working on is being validated with the following regex pattern:

.{5,}+

What is this checking for?

Other fields are being checked with this pattern which I also don't understand:

.+
+3  A: 

Any character, 5 or more times.

  • "." means any character except a line break.
  • {m, n} defines a bounded interval. "m" is the min. "n" is the max. If n is not defined, as is here, it is unlimited.
  • "+" means possessive.
Mike
`+` is not greedy. Regular expressions are greedy by default. You're likely thinking of `?`, which makes a pattern **lazy**. `+` is possessive.
Daniel Vandersluis
You're right :) Thanks for the correction!
Mike
+3  A: 

.{5,}+ means

  1. Match any single character that is not a line break character
    1. Between 5 and unlimited times; as many times as possible, without giving back (possessive)

.+ is the same thing but it matches between 1 and unlimited times, giving back as needed (greedy).

As I've mentioned many times before, I'm a huge fan of RegexBuddy. It's "Create" mode is excellent for deconstruction regular expressions.

Mark Biek
+4  A: 
KennyTM
+16  A: 

We can break your pattern down into three parts:

The dot is a wildcard, it matches any character (except for newlines, by default, unless the /s modifier is set).

{5,} is specifies repetition on the dot. It says that the dot must match at least 5 times. If there was a number after the comma, the dot would have to match between 5 and that number of times, but since there's no number, it can match infinite times.

In your first pattern, the + is a possessive quantifier (see below for how + can mean different things in different situations). It tells the regular expression engine that once it's satisfied the previous condition (ie. .{5,}), it should not try to backtrack.


Your second pattern is simpler. The dot still means the same thing as above (works as a wildcard). However, here the + has a different meaning, and is a repetition operator, meaning that the dot must match 1 or more times (that could also be expressed as .{1,}, as we saw above).

As you can see, + has a different meaning depending on context. When used on its own, it is a repetition operator. However when it follows a different repetition operator (either *, ?, + or {...}) it becomes a possessive quantifier.

Daniel Vandersluis
+1 for the clearest explanation of regualar expressions i've seen to date.
Nico Burns