Having following Python code:
>>> from lxml import etree
>>> root = etree.XML("<a><b></b></a>")
>>> etree.tostring(root)
'<a><b/></a>'
How can I force lxml to use "long" version?
Like
>>> etree.tostring(root)
'<a><b></b></a>'
...
from lxml.html.clean import clean_html, Cleaner
def clean(text):
try:
cleaner = Cleaner(scripts=True, embedded=True, meta=True, page_structure=True, links=True, style=True,
remove_tags = ['a', 'li', 'td'])
print (len(cleaner.clean_html(text))- len(text))
return...
As per the official documentation of lxml, if one wants to validate a xml document against a xml schema document, one has to
construct the XMLSchema object (basically, parse the schema document)
construct the XMLParser, passing the XMLSchema object as its schema argument
parse the actual xml document (instance document) using the const...
I'm trying to parse an xml file using lxml. xml.etree allowed me to simply pass the file name as a parameter to the parse function, so I attempted to do the same with lxml.
My code:
from lxml import etree
from lxml import objectify
file = "C:\Projects\python\cb.xml"
tree = etree.parse(file)
but I get the error:
Traceback (most rece...
I have xml of the format:
<channel>
<games>
<game slot='1'>
<id>Bric A Bloc</id>
<title-text>BricABloc Hoorah</title-text>
<link>Fruit Splat</link>
</game>
</games>
</channel>
I've parsed this xml using lxml.objectify, via:
tree = objectify.parse(file)
There will potential...
Hello.
In my program, I need to make use of an ElementTree object in various functions in my program.
More specifically, I am doing this:
tree = etree.parse('somefile.xml')
I am passing this tree around in my program.
I was wondering whether this is a good approach, or can I do this:
Create a global tree (I come from a
C++ backg...
Hi Guys,
I am trying to build lxml for Python 27 on windows 64 bit machine. I couldn't find lxml egg for Python27 version. So I am compiling it from sources. I am following instructions on this site
http://codespeak.net/lxml/build.html
under static linking section. I am getting error
C:\Documents and Settings\Administrator\Desktop\l...
Does anyone know of a memory efficient way to generate very large xml files (e.g. 100-500 MiB) in Python?
I've been utilizing lxml, but memory usage is through the roof.
...
As the question says, what would be the difference between:
x.getiterator() and x.iter(), where x is an ElementTree or an Element? Cause it seems to work for both, I have tried it.
If I am wrong somewhere, correct me please.
...
Hi all.
This follows my previous questions on using lxml and Python.
I have a question, as to when I have a choice between using the methods provided by the lxml.etree and where I can make use of XPath, what should I use?
For example, to get a list of all the X tags in a XML document, I could either iterate through it using the getit...
So I gotta deal with some xml that looks like this:
<ns2:foobarResponse xmlns:ns2="http://api.example.com">
<duration>206</duration>
<artist>
<tracks>...</tracks>
</artist>
</ns2:foobarResponse>
I found lxml and it's objectify module, that lets you traverse a xml document in a pythonic way, like a dictionary.
Problem is: ...
I am poking at XBRL documents trying to get my head around how to effectively extract and use the data. One thing I have been struggling with is making sure I use the context information correctly. Below is a snippet from one of the documents I am playing with (this is from Mattel's latest 10-K)
I want to be able to efficiently colle...
Hi.
I have the following XML file:
<book>
<bookname child="test">
<text> Works </text>
<text> Doesn't work </text>
</bookname>
</book>
This is just a one block, there are more than one <bookname> tags. I need to iterate through the whole document and remove specific <text> tags. How do I do that?
My approach is to create an El...
Hi all!
i have installed lxml2.2.2 on windows platform(i m using python version 2.6.5).i tried this simple command:
from lxml.html import parse
p= parse(‘http://www.google.com’).getroot()
but i am getting the following error:
Traceback (most recent call last): File “”, line 1, in p=parse(‘http://www.google.com’).getroot() File “C:...
I have an HTML file (from Newegg) and their HTML is organized like below. All of the data in their specifications table is 'desc' while the titles of each section are in 'name.' Below are two examples of data from Newegg pages.
<tr>
<td class="name">Brand</td>
<td class="desc">Intel</td>
</tr>
<tr>
<td class="name">Series</...
I'm trying to test some python code that uses urllib2 and lxml.
I've seen several blog posts and stack overflow posts where people want to test exceptions being thrown, with urllib2. I haven't seen examples testing successful calls.
Am I going down the correct path?
Does anyone have a suggestion for getting this to work?
Here is what...
Hello,
I'm working with a schema that was built by a third party and I'd like to validate it with lxml. The problem is that such a schema is split over different xsd files, which reference themselves.
For example, a file called "extension.xsd" (which builds upon the "master" schema) has a line like:
<redefine schemaLocation="master.x...
Hi everyone. I have a XML file with the structure as shown below:
<x>
<y/>
<y/>
.
.
</x>
The number of <y> tags are arbitrary.
I want to get the text of the <y> tags and for this I decided to use XPath. I have figured out the syntax, say for the first y: (Assume root as x)
textFirst = root.xpath('y[1]/text()')
This wor...
I'm trying to use the lxml library to parse an XML file...what I want is to use XML as the datasource, but still maintain the normal Django-way of interactive with the resulting objects...from the docs, I can see that lxml.objectify is what I'm suppossed to use, but I don't know how to proceed after: list = objectify.parse('myfile.xml')
...
Hi folks,
I'd like to use lxml library to validate XML Schemas in Python 3.1.2.
Since the Snow Leopard MAC OS comes with the Python 2.6.1 installed, firstly, I downloaded the Python 3.1.2 automated installer at http://www.python.org/ftp/python/3.1.2/python-3.1.2-macosx10.3-2010-03-24.dmg and installed it.
Secondly, I downloaded lxml 2...