views:

76

answers:

4

Hello All,

I have the following aaaa_bb_cc string to match and written a regex pattern like

\\w{4}+\\_\\w{2}\\_\\w{2} and it works. Is there any simple regex which can do this same ?

+3  A: 

You don't need to escape the underscores:

\w{4}+_\w{2}_\w{2}

And you can collapse the last two parts, if you don't capture them anyway:

\w{4}+(?:_\w{2}){2}

Doesn't get shorter, though.

(Note: Re-add the needed backslashes for Java's strings, if you like; I prefer to omit them while talking about regular expressions :))

Joey
Yes. I can understand that ;)
TuxGeek
+2  A: 

Yes, you can use just \\w{4}_\\w{2}_\\w{2} or maybe \\w{4}(_\\w{2}){2}.

splix
+2  A: 

Looks like your \w does not need to match underscore, so you can use [a-zA-Z0-9] instead

[a-zA-Z0-9]{4}_[a-zA-Z0-9]{2}_[a-zA-Z0-9]{2}
S.Mark
Missed that one. However, is `\w` in Java really only `[a-zA-Z0-9]`? In .NET at least both `\d` and `\w` match pretty much anything counting as decimal number or letter.
Joey
+2  A: 

I sometimes do what I call "meta-regexing" as follows:

    String pattern = "x{4}_x{2}_x{2}".replace("x", "[a-z]");
    System.out.println(pattern); // prints "[a-z]{4}_[a-z]{2}_[a-z]{2}"

Note that this doesn't use \w, which can match an underscore. That is, your original pattern would match "__________".

If x really needs to be replaced with [a-zA-Z0-9], then just do it in the one place (instead of 3 places).

Other examples

polygenelubricants
+1 . Nice way to construct regex :) I will check this one.
TuxGeek
@UK: Essentially the idea is that you don't need to have the actual regex explicitly written out. If it makes it more readable/maintainable to derive the regex programmatically, then go ahead
polygenelubricants
@polygenelubricants, true and its easy to understand. Accepting your solution
TuxGeek