I am trying to remove comments from a list of elements that were obtained by using lxml
The best I have been able to do is:
no_comments=[element for element in element_list if 'HtmlComment' not in str(type(each))]
I am wondering if there is a more direct way?
I am going to add something based on Matthew's answer - he got me almost there the problem is that when the element are taken from the tree the comments lose some identity (I don't know how to describe it) so that it cannot be determined whether they are HtmlComment class objects using the isinstance() method
However, that method can be used when the elements are being iterated through on the tree
from lxml.html import HtmlComment
no_comments=[element for element in root.iter() if not isinstance(element,HtmlComment)
For those novices like me root is the base html element that holds all of the other elements in the tree there are a number of ways to get it. One is to open the file and iterate through it so instead of root.iter() in the above
html.fromstring(open(r'c:\temp\testlxml.htm').read()).iter()