tags:

views:

37

answers:

2

There are 46 links on the site: http://www.math.hmc.edu/~ajb/PCMI/problem_solve.html. I want links from 7th link to 33th link. I need them separated to a file. Can I copy the outgoing links?

+1  A: 

Grep source for

<a href

then take the result into vim and use the following command

%s/.*<a href="\(.*\)">.*/\1/g

pcmi08_b.pdf
pcmi07_b.pdf
pcmi06_b.pdf
pcmi05_a.pdf
pcmi05_b.pdf
pcmi04_b.pdf
pcmi03_b.pdf
http://www.math.hmc.edu/putnam/
pcmi_classic.pdf
pcmi_classic.tex
http://www.math.hmc.edu/putnam/seminar.shtml
pcmi_tng.tex
pss_solution.pdf
pss_solution.tex
http://www.maa.org/mathhorizons/
http://www.maa.org/pubs/mathmag.html
http://www.maa.org/pubs/cmj.html
http://www.maa.org/pubs/monthly.html
http://www.math.hmc.edu/funfacts/
http://mathforum.org/students/
http://mathworld.wolfram.com
http://www.cecm.sfu.ca/projects/ISC/
http://www.research.att.com/%7Enjas/sequences/index.html
http://ams.rice.edu/mathscinet/
http://mathforum.org/wagon/
http://math.scu.edu/putnam/index.html
http://www.unl.edu/amc/a-activities/a7-problems/putnam/
http://www.unl.edu/amc/a-activities/a7-problems/problemarchive.html
http://www.amazon.com/exec/obidos/ASIN/0387982191/ref=pd_sxp_elt_l1/t/002-7200940-4079202
http://www.amazon.com/exec/obidos/tg/detail/-/038790803X/ref=pd_sim_books_1/t/002-7200940-4079202?v=glance&amp;amp;s=books
http://www.amazon.com/exec/obidos/tg/detail/-/0471135712/ref=pd_sim_books_2/t/002-7200940-4079202?v=glance&amp;amp;s=books
http://www.amazon.com/exec/obidos/tg/detail/-/0817641556/ref=pd_bxgy_text_1/t/002-7200940-4079202?v=glance&amp;amp;s=books&amp;amp;st=*
http://www.amazon.com/exec/obidos/ASIN/0387947434/ref=pd_pym_rvi_1/t/002-7200940-4079202
http://www.amazon.com/exec/obidos/ASIN/0883855194/ref=pd_sxp_elt_l1/t/002-7200940-4079202
http://www.amazon.com/exec/obidos/tg/detail/-/0883853256/qid=1057672535/sr=8-1/ref=sr_8_1/t/002-7200940-4079202?v=glance&amp;amp;s=books&amp;amp;n=507846" style="font-style: italic;
http://www.amazon.com/exec/obidos/ASIN/088385807X/ref=pd_sxp_elt_l1/t/002-7200940-4079202
http://www.amazon.com/exec/obidos/ASIN/0486694151/ref=pd_sxp_elt_l1/t/002-7200940-4079202
http://www.amazon.com/exec/obidos/ASIN/0486695735/ref=pd_sxp_elt_l1/t/002-7200940-4079202
http://www.amazon.com/exec/obidos/ASIN/0691023565/ref=pd_sxp_elt_l1/t/002-7200940-4079202
Andrew Clark
A: 

It is rather easy to parse the html and extract the < a > tags. And then find the href attributes. If you have those you can select which to keep and which to drop.

Gamecat