views:

31

answers:

1

I hope I asked that correctly. I am trying to figure out what element.sourceline does and if there is some way I can use its features. I have tried building my elements from the html a number of ways but every time I iterate through my elements and ask for sourceline I always get None. When I tried to use the built-in help I done't get anything either.

I have Googled for an example but none were found yet.

I know it is a method of elements not trees but that is the best I have been able to come up with.

In response to Jim Garrison's request for an example

theTree=html.parse(open(r'c:\temp\testlxml.htm'))
check_source
the_elements=[(e,e.sourceline) for e in theTree.iter()]  #trying to get the sourceline
for each in the_elements:
    if each[1]!=None:
    check_source.append(each)

When I run this len(check_source)==0

My htm file has 19,379 lines so I am not sure you want to see it

I tried one solution

>>> myroot=html.fromstring(xml)
>>> elementlines=[(e,e.sourceline) for e in myroot.iter()]
>>> elementlines
[(<Element doc at 12bb730>, None), (<Element foo at 12bb650>, None)]

When I do the same thing with etree I get what was demonstrated

>>> myroot=etree.fromstring(xml)
>>> elementlines=[(e,e.sourceline) for e in myroot.iter()]
>>> elementlines
[(<Element doc at 36a6b70>, 1), (<Element foo at 277b4e0>, 2)]

But my source htm is so messy I can't use etree to explore the tree I get an error

+1  A: 

sourceline will return the line number determined at the time of parsing a document. So it won't apply to an Element that was added through the API. For example:

from lxml import etree

xml = '<doc>\n<foo>rain in spain</foo>\n</doc>'
root = etree.fromstring(xml)

print root.find('foo').sourceline # 2

root.append(etree.Element('bar'))
print etree.tostring(root)
print root.find('bar').sourceline # None

I'm pretty sure the same applies to lxml.html.

ars
I appreciate the effort but it does not seem to though souceline is shown as a method/attribute of elements
PyNEwbie