tags:

views:

112

answers:

3

Sorry this might be a simple question, but I could not figure it out. What I need is to filter out all the <a href...> and </a> strings out from a html text. Not sure what regular expression I should use? I tried the following search without any luck:

/<\shref^(>)>

what I mean here is to search for any string starting with "< href" and any string not containing '>' and finally '>'. My search code is not working. What is the correct one?

+1  A: 

If I understand what you're looking for it should be <\shref[^>]*>.

Shaun
* is OK, but \+ is better.
David.Chu.ca
A: 

I think I got it:

/<a\shref[^>]+>

where [] is a set and ^ is not.

David.Chu.ca
+1  A: 

Another way would be to use non-greedy matching:

/<a\shref.\{-}>
Alok
I am confused by your pattern. Can you explain "\{-}"? "\{" looks like escape or match '{', then '-'. '}' has not matched left one.
David.Chu.ca
`\{-}` (or `\{-\}` has special meaning in vim: it's like `*`, but it's non-greedy. It only matches the least amount of data to make the whole pattern match. In vim, type `:h pattern` and then search for `{-}` for more information.
Alok
Very cool! A very nice alternative to search. I like it. For multiple line case, "<a\_.\{-}>" will match <a cross lines. I found \_. is for matching any char including newline.
David.Chu.ca