Hello,
I've been trying to write this spider for weeks but without success. What is the best way for me to code this in Python:
1) Initial url: http://www.whitecase.com/Attorneys/List.aspx?LastName=A
2) from initial url pick up these urls with this regex:
hxs.select('//td[@class="altRow"][1]/a/@href').re('/.a\w+')
[u'/cabel', u'/jacevedo', u'/jacuna', u'/aadler', u'/zahmedani', u'/tairisto', u
/zalbert', u'/salberts', u'/aaleksandrova', u'/malhadeff', u'/nalivojvodic', u'
....
3) Go to each of these urls and scrape the school info with this regex
hxs.select('//td[@class="mainColumnTDa"]').re('(?<=(JD,\s))(.*?)(\d+)'
[u'JD, ', u'University of Florida Levin College of Law, <em>magna cum laude</em>
, Order of the Coif, Symposium Editor, Florida Law Review, Awards for highest
grades in Comparative Constitutional History, Legal Drafting, Real Property and
Sales, ', u'2007']
4) Write the scraped school info into schools.csv file
Can you help me write this spider in Python? I've been trying to write it in Scrapy but without success. See my previous question.
Thank you.