Hi there.
I am using lxml.html to parse some hmtl to get links, however when it hits a link which contains an image it just returns blank, what it'd really like is to be able to detect if it's an image, and then try and return the image alt text.
So it looks like this...
from lxml.html import parse, fromstring
doc = fromstring('<a href="Link One">Anchor Link One</a><br /><a href="Link Two"<img src="Image Link Two" alt="Alt Image" /></a><br /><a href="Link Three">Anchor Link Three</a><br />')
for link in doc.cssselect('a'):
print '%s: %s' % (link.text_content(), link.get('href'))
result
Anchor Link One: Link One
: Link Two
Anchor Link Three: Link Three
So I tried using .html_content() to try and get the raw html and then check if that was an image.
Hmm.. How to detect if wrapped in image, and/or pull out the html there....