tags:

views:

40

answers:

2

Hi,

As per my understanding of RE

--> * means matches 0 or more occurrences of prev regex
--> + means matches 1 or more occurrences of prev regex

Now lets take a look at the following examples

FIRST:-

% regexp {:+} "DHCP:Enabled" first
1
% puts $first
:                     --> ":" is stored in variable first
%

SECOND:-

% regexp {:*} "DHCP:Enabled" sec
1
% puts $sec
                     --> Nothing is stored in variable second
%

Why is ":" stored for the FIRST one and not the SECOND?

+7  A: 

The second regexp {:*} matches the empty string because the empty string is 0 occurrences of :. If you use the -indices option for regexp, you'll see that it matches at position 0.

 % regexp -indices :* "DHCP:Enabled" indices
 1
 % puts $indices
 0 -1

In other words, the regexp matches at the first character and returns.

Trey Jackson
A: 

It matches the empty string so that it can match that empty string at the start of “DHCP:Enabled”. The regular expression engine like to match things up as soon as possible. To show, here's an interactive session:

% regexp -inline {:*} "DHCP:Enabled"
{}
% regexp -inline -all {:*} "DHCP:Enabled"
{} {} {} {} : {} {} {} {} {} {} {}
% regexp -inline -indices -all {:*} "DHCP:Enabled"
{0 -1} {1 0} {2 1} {3 2} {4 4} {5 4} {6 5} {7 6} {8 7} {9 8} {10 9} {11 10}

The -inline option is useful for simple testing, the -all matches in every matchable location instead of just the first, and the -indices returns locations rather than the string.

Note that only once (4 4) is the end at least at the same index as the start; in all other cases, an empty string matches (and it's legal; you said that matching nothing was OK).

In general, it's a really good idea to make sure that your overall RE cannot match the empty string or you'll be surprised by the results.

Donal Fellows