views:

197

answers:

3

I've got a HTML file, and I'd like to grab all the links that are in the file and save it into another file using Vim.

I know that the regex would be something like:

:g/href="\v([a-z_/]+)"/

but I don't know where to go from here.

A: 

Have you tried this?

:g/href="\v([a-z_/]+)"/w >> outfile

Jeff Meatball Yang
This doesn't work. It results in the search term being correctly found, but it then simply outputs the entire contents of the file into the new outfile.
Sasha
+1  A: 

The challenge here lies with extracting all of the links where there may be multiple on line, otherwise you'd be able to simply do:

" Extract all lines with href=
:g/href="[^"]\+"/w >> list_of_links.txt
" Open the new file
:e list_of_links.txt
" Extract the bit inside the quotation marks
:%s/.*href="\([^"]\+\)".*/\1/

The simplest approach would probably be to do this:

" Save as a new file name
:saveas list_of_links.txt
" Get rid of any lines without href=
:g!/href="\([^"]\+\)"/d
" Break up the lines wherever there is a 'href='
:%s/href=/\rhref=/g
" Tidy up by removing everything but the bit we want
:%s/^.*href="\([^"]\+\)".*$/\1/

Alternatively (following a similar theme),

:g/href="[^"]\+"/w >> list_of_links.txt
:e list_of_links.txt
:%s/href=/\rhref=/g
:%s/^.*href="\([^"]\+\)".&$/\1/

(see :help saveas, :help :vglobal, :help :s)

However, if you really wanted to do it in a more direct way, you could do something like this:

" Initialise register 'h'
:let @h = ""
" For each line containing href=..., get the line, and carry out a global search
" and replace that extracts just the URLs and a double quote (as a delimiter)
:g/href="[^"]\+"/let @h .= substitute(getline('.'), '.\{-}href="\([^"]\+\)".\{-}\ze\(href=\|$\)', '\1"', 'g')
" Create a new file
:new
" Paste the contents of register h (entered in normal mode)
"hp
" Replace all double quotes with new-lines
:s/"/\r/g
" Save
:w

Finally, you could do it in a function with a for loop, but I'll leave that for someone else to write!

Al
+1  A: 

Put your cursor in the first row/column and try this:

:redir > output.txt|while search('href="', "We")|exe 'normal yi"'|echo @"|endwhile|redir END
Brian Carper