tags:

views:

73

answers:

4

So * by itself means repeat the previous item zero or more times. The output of * is nothing. What about **? This gives an output, but how does matching zero or more times of nothing give something? Could you also explain that please? Same for ?*: Nothing precedes ?, so that is nothing right? How does matching zero or more times of nothing give something?

mugbear:~# grep '*' emptyspace                                                  
mugbear:~# grep '**' emptyspace                                                 
line1
line2

line4
line5

line7
mugbear:~# grep '?' emptyspace
mugbear:~# grep '?*' emptyspace                                         
line1
line2

line4
line5

line7
A: 

It's greedy on the first character and then greedy on all the remaining characters. I believe backtracking is taking over as well.

drachenstern
What does "greedy" mean in that sense?
Doug
http://en.wikipedia.org/wiki/Regular_expression "The standard quantifiers in regular expressions are greedy, meaning they match as much as they can, only giving back as necessary to match the remainder of the regex."
drachenstern
It is not "greedy on the first character". These things don't stack, at least not that way. DigitalRoss is correct.
tchrist
@tchrist ~ so did you downvote me? I only offered what I thought might be the answer, and in reading @DigitalRoss's answer, I may still be right, on account of "depending on specific RE implementations" but it's still backtracking. Anyways, I see that my answer was wrong, and want to delete it, but would like to understand where I get a downvote when I at least tried to answer.
drachenstern
+1  A: 

Every string contains 0 or more repetitions of every other string.

tchrist
A: 

? or * by themselves will do nothing as they have nothing to process. ** and ?* are bad form and should not be used. Anything that compile regex strings properly should error out when presented with either. Strict compilers will error with ? or * alone as well.

Sold Out Activist
+2  A: 

A leading * is generally not magic because of its context

You are asking questions with answers that are not fully specified and as such are almost certain to depend on the specific RE implementation.

For that matter, there isn't even anything close to a single standard RE, and the variations are not slightly different interpretations but dramatically different pattern definitions.

At first, there was classic grep / sed / ed / awk. A considerably expanded set of patterns eventually appeared and was made popular by Perl and other languages.

Some of these implementations attempt to notice when a character could not be magic due to its position.

So, a plain * might search for an actual * and ** then for 0 or more * characters. (And every string has 0 or more...)


Note: Yes, there is a Posix standard but it has so little influence that it can be disregarded.

DigitalRoss