tags:

views:

94

answers:

3

I'm trying to figure out the reason behind some regex comparison results I'm getting in Vim. I'm trying to match strings that begin line with one or more asterisks. Here's how various regex's match the strings:

echo '* text is here' =~ '\^*\*\s'  prints 1 (i.e., MATCH)
echo '* text is here' =~ '^*\*\s'   prints 0 (NO MATCH)

echo '** text is here' =~ '\^*\*\s' (MATCH)
echo '** text is here' =~ '^*\*\s'  (MATCH)

echo '*** text is here' =~ '\^*\*\s' (MATCH)
echo '*** text is here' =~ '^*\*\s'  (NO MATCH)

echo 'text is here' =~ '\^*\*\s' (NO MATCH)
echo 'text is here' =~ '^*\*\s'  (NO MATCH)

echo '*text is here' =~ '\^*\*\s' (NO MATCH)
echo '*text is here' =~ '^*\*\s'  (NO MATCH)

From these results I gather that when the begin of line character (^) is not prepended with a backslash the following * is read as a literal and the backslash_* is also read as a literal. So the result when comparing using no-initial-backslash method matches only string with exactly two asterisks followed by a whitespace.

When the ^-character is prepended with a backslash the first asterisk is a literal asterisk and the backslash-* stands for 'zero or more of preceding character'.

The version with the initial backslash gives me the answers I want; i.e., it matches all-and-only lines beginning with one or more asterisks followed by a whitespace. Why is this? When I look at the Vim documentation it says that \^ stands for a literal ^, not the beginning of a line. I'm sure there's a simple explanation but I can't see it. Thanks for any clarification.

I also notice some similar behavior when typing in this question. That is, the following string has a backslash before the second asterisk that doesn't show up in the text: '^**\s' .

UPDATE: Okay, I think I've grokked Ross' answer and see that the de-anchoring was giving me result I wanted. The de-anchoring is also giving me a result I don't want, namely:

echo 'text* is here' =~ '\^*\*\s' (MATCH)

SO MY QUESTION NOW IS: what regex will match all-and-only lines that begin with one or more asterisks followed by a whitespace? The regex below gets close but fails on the final example:

echo '*** text is here' =~ '^**\s' (MATCH)
echo '* text is here' =~ '^**\s' (MATCH)
echo 'text* is here' =~ '^**\s' (NO MATCH)
echo ' * text is here' =~ '^**\s' (MATCH) -- want a no match here

The version with slash-asterisk as first asterisk doesn't work either (i.e., '^\**\s' ).

FINAL UPDATE: Okay, I think I found the version that works. I don't understand exactly why it works, though. It looks like what I would expect except for the asterisk after the ^ character, but having a repeater after the ^ seems nonsensical:

echo '*** text is here' =~ '^*\**\s' (MATCH)
echo '* text is here' =~ '^*\**\s'   (MATCH)
echo 'text* is here' =~ '^*\**\s'   (NO MATCH)
echo ' * text is here' =~ '^*\**\s' (NO MATCH)
+4  A: 

Ahh, interesting explanation, but not quite right.

The \^ indeed refers to a literal circumflex.

But * doesn't mean "one or more", it means "zero or more", so \^* simply matches nothing if it needs to in order to make the rest of the expression succeed, and in addition it obviously will "deanchor" the rest of the search making it easier to succeed.

I imagine that with this piece of the puzzle filled in you will have no trouble understanding the rest...

Update: I think the final piece of the puzzle is that vi does something a bit different with out-of-context regex magic characters. If you use one in a context where it can't be magic, you won't get an error like you might with Perl or Ruby, the character simply becomes non-magic. And * doesn't repeat the ^ anchor, so a search like /*/ or /^*/ will look for any actual * or a line beginning with an actual *, respectively.

DigitalRoss
Ross -- Thanks, but I still don't quite get it. My mention of 'one or more' was just a typo; I knew the \* meant 'zero or more' (but actually in default vim regex the * repeater should have no initial backslash). But I still can't figure out your explanation, because the version with a beginning backslash does not match strings beginning with something other than an asterisk. I've added additional examples to show this.
Herbert Sitz
+1  A: 

'\^*\*\s' matches because the first asterisk denotes zero or more ^ (in this case, zero), and then the next literal * matches the first occurrence.

Jay
+2  A: 

Why not simply use: '^\*\+' ? This will match one or more asterisks at the beginning of the line in VIM.

Nathan Ernst
Thank you. That was exactly what I wanted. Got started going wrong direction with the * repeater and didn't think to switch approach.
Herbert Sitz