tags:

views:

120

answers:

4

Hi all, I am wondering if there is a better to represent a fix amount of repeats in a regular expression. For example, if I just want to match exactly 14 letters/digits, I am using ^\w\w\w\w\w\w\w\w\w\w\w\w\w\w$ which will match a word like UNL075BE499135 and not match UNL075BE499135AAA is there a handy way to do it? In am currently doing it in java but I guess this may apply to other language as well. Thanks in advance.

+10  A: 

^\w{14}$ in Perl and any Perl-style regex.

If you want to learn more about regular expressions - or just need a handy reference - the Wikipedia Entry on Regular Expressions is actually pretty good.

eldarerathis
+3  A: 

For Java:

http://download.oracle.com/docs/cd/E17409_01/javase/tutorial/essential/regex/quant.html

X, exactly n times: X{n}
X, at least n times: X{n,}
X, at least n but not more than m times: X{n,m}

shookster
That is for any perl-compatible regular expression.
BipedalShark
@BipedalShark the 'bound' is defined in POSIX regexp standard. See `man 7 regex` on most *nix systems. Most common regex languages including Perl's derive at some point from POSIX's.
Ven'Tatsu
+1  A: 

In Java create the pattern with Pattern p = Pattern.compile("^\\w{14}$"); for further information see the javadoc

janogonzalez
Or use the short-hand: `"UNL075BE499135".matches("^\\w{14}$");`
Bart Kiers
A: 

The finite repetition syntax uses {m,n} in place of star/plus/question mark.

From java.util.regex.Pattern:

X{n}      X, exactly n times
X{n,}     X, at least n times
X{n,m}    X, at least n but not more than m times

All repetition metacharacter have the same precedence, so just like you may need grouping for *, +, and ?, you may also for {n,m}.

  • ha* matches e.g. "haaaaaaaa"
  • ha{3} matches only "haaa"
  • (ha)* matches e.g. "hahahahaha"
  • (ha){3} matches only "hahaha"

Also, just like *, +, and ?, you can add the ? and + reluctant and possessive repetition modifiers respectively.

    System.out.println(
        "xxxxx".replaceAll("x{2,3}", "[x]")
    ); "[x][x]"

    System.out.println(
        "xxxxx".replaceAll("x{2,3}?", "[x]")
    ); "[x][x]x"

Essentially anywhere a * is a repetition metacharacter for "zero-or-more", you can use {...} repetition construct. Note that it's not true the other way around: you can use finite repetition in a lookbehind, but you can't use * because Java doesn't officially support infinite-length lookbehind.

References

Related questions

polygenelubricants