tags:

views:

68

answers:

3

Okay, so ^[A-Z] means beginning with a capital letter. So what does ^[A-Z]* mean? Does it not mean zero or more occurrences of a capital letter? Because it is really confusing me since it is including the empty line in the output which is not a capital letter. Also, could you explain ^[A-Z]*$?

mugbear:~# clear
mugbear:~# cat emptyspace 
line1
line2

line4
line5

line7
mugbear:~# grep '^[A-Z]*' emptyspace                                            
line1
line2

line4
line5

line7
mugbear:~# grep '^[A-Z]*$' emptyspace


mugbear:~# 
+2  A: 

An empty line is zero or more occurrences of a capital letter. The latter expression is ambiguous, starts and ends with zero or more occurrences of a capital letter, also known as "anything."

Update: Please refer to Tanzelax's answer as the answer that should have been accepted.

greg
That was dumb, I even wrote that out and it didn't even process through my brain correctly. Thanks.
Doug
This late on a Friday, mistakes are bound to happen ;) Feel free to accept a correct answer.
greg
Updated my answer to answer your additional questions.
greg
You've checked a wrong answer. Greg's description is wrong.
tchrist
Updated my answer to point to Tanzelax's answer.
greg
+1  A: 

Zero or more occurrences can include zero occurrences.

Thus, ^[A-Z]* includes just 'new line', which is every line.

$ is end of line, so ^[A-Z]*$ means 'new line, followed by any number (including zero) of capital letters, followed by an end of line', which is only the blank lines (which are 'new line, zero capital letters, end of line').

Tanzelax
That's not true, `^[A-Z]*$` will match anything, not only newlines, zero capital letters, and EOL. "AaA" starts and ends with zero or more occurrences of a capital letter, as well as "aaa."
greg
@greg, I suggest you put your theory to the test. Your statement is incorrect.
tchrist
@tchrist thanks. What I commented is in fact completely incorrect. The "blank" lines are captured because the * applied to the character set (**zero** or more). This regex *will* capture something in all caps though. My examples are not matched as a pattern due to the `*` portion of the token matching blankness first.
greg
+1  A: 

If you're asking what the addition of the asterisk to the end of the expression does, it means to match 0 or more times. In the expression that you provided it means to match as many consecutive capital letters as possible.

http://www.regular-expressions.info/ might also be of some help to you.

KayakJim