tags:

views:

118

answers:

3

I have a group of lines like the following:

tb-set-node-recipe $vpn1   W2K3_SP2_VPN_SRV
tb-set-node-os     $vpn2   I_W2K3_SP2_VPN_SRV
tb-set-node-os     $xpcli1 I_XP_SP3_VPN_CLI
tb-set-node-os     $xpcli2 I_XP_SP2_VPN_CLI
tb-set-node-os     $xpcli3 I_XP_SP1_VPN_CLI
tb-set-node-recipe $ftp1   FC8_KS_FTP_SRV
tb-set-node-os     $smb1   XP_SP3-STD
tb-set-node-recipe $web1   FC8_KS_WEB_SRV

I am using the following regular expression in the Java language to parse out the tb-set-node-os statements:

(tb\-set\-node\-os)\s+[\$\w]+\s+\w+

It is working perfectly except for the second to last line that contains $smb1

Does anyone have any idea why this might be? I can't seem to figure this one out. Thanks in advance!

+8  A: 

\w does not match hyphen (-) so you will need to adapt it to this:

(tb\-set\-node\-os)\s+[\$\w]+\s+[\w-]+

Note that the - doens't need to be escaped (but can be) if it is first or last in the character class, but it must be escaped if it is in the middle of the class.


Also worth nothing, you can potentially improve performance with possessive quantifiers when you have sequential mutually-exclusive items:

(tb\-set\-node\-os)\s++[\$\w]++\s++\w++

Since \s can never match \w (and visa-versa), possessive quantifiers (*+ and ++) can be used instead of the usual greedy ones, which will avoid/prevent any potential backtracking.

Peter Boughton
+6  A: 

Probably this is because the dash - is not a word character (does not match \w), so something like this might work:

(tb\-set\-node\-os)\s+[\$\w]+\s+[\w\-]+
Zef Hemel
+1  A: 

The only problem I see is that the $smb1 line has a hyphen in the last column which doesn't seem to be matched by \w. You might try .+ at the end of your expression.

steamer25