lxml

How to prevent lxml prom compacting elements?

Having following Python code: >>> from lxml import etree >>> root = etree.XML("<a><b></b></a>") >>> etree.tostring(root) '<a><b/></a>' How can I force lxml to use "long" version? Like >>> etree.tostring(root) '<a><b></b></a>' ...

python [lxml] - cleaning out html tags

from lxml.html.clean import clean_html, Cleaner def clean(text): try: cleaner = Cleaner(scripts=True, embedded=True, meta=True, page_structure=True, links=True, style=True, remove_tags = ['a', 'li', 'td']) print (len(cleaner.clean_html(text))- len(text)) return...

In-document schema declarations and lxml

As per the official documentation of lxml, if one wants to validate a xml document against a xml schema document, one has to construct the XMLSchema object (basically, parse the schema document) construct the XMLParser, passing the XMLSchema object as its schema argument parse the actual xml document (instance document) using the const...

how to pass an xml file to lxml to parse?

I'm trying to parse an xml file using lxml. xml.etree allowed me to simply pass the file name as a parameter to the parse function, so I attempted to do the same with lxml. My code: from lxml import etree from lxml import objectify file = "C:\Projects\python\cb.xml" tree = etree.parse(file) but I get the error: Traceback (most rece...

How to print an Objectified Element?

I have xml of the format: <channel> <games> <game slot='1'> <id>Bric A Bloc</id> <title-text>BricABloc Hoorah</title-text> <link>Fruit Splat</link> </game> </games> </channel> I've parsed this xml using lxml.objectify, via: tree = objectify.parse(file) There will potential...

Passing around an ElementTree

Hello. In my program, I need to make use of an ElementTree object in various functions in my program. More specifically, I am doing this: tree = etree.parse('somefile.xml') I am passing this tree around in my program. I was wondering whether this is a good approach, or can I do this: Create a global tree (I come from a C++ backg...

Building lxml for Python 27

Hi Guys, I am trying to build lxml for Python 27 on windows 64 bit machine. I couldn't find lxml egg for Python27 version. So I am compiling it from sources. I am following instructions on this site http://codespeak.net/lxml/build.html under static linking section. I am getting error C:\Documents and Settings\Administrator\Desktop\l...

Generating very large XML files in Python?

Does anyone know of a memory efficient way to generate very large xml files (e.g. 100-500 MiB) in Python? I've been utilizing lxml, but memory usage is through the roof. ...

What is the difference between getiterator() and iter() wrt to lxml.

As the question says, what would be the difference between: x.getiterator() and x.iter(), where x is an ElementTree or an Element? Cause it seems to work for both, I have tried it. If I am wrong somewhere, correct me please. ...

Confused about using XPath or not.

Hi all. This follows my previous questions on using lxml and Python. I have a question, as to when I have a choice between using the methods provided by the lxml.etree and where I can make use of XPath, what should I use? For example, to get a list of all the X tags in a XML document, I could either iterate through it using the getit...

Making lxml.objectify ignore xml namespaces?

So I gotta deal with some xml that looks like this: <ns2:foobarResponse xmlns:ns2="http://api.example.com"&gt; <duration>206</duration> <artist> <tracks>...</tracks> </artist> </ns2:foobarResponse> I found lxml and it's objectify module, that lets you traverse a xml document in a pythonic way, like a dictionary. Problem is: ...

Is there a more pythonic way to access the child elements of parents using lxml

I am poking at XBRL documents trying to get my head around how to effectively extract and use the data. One thing I have been struggling with is making sure I use the context information correctly. Below is a snippet from one of the documents I am playing with (this is from Mattel's latest 10-K) I want to be able to efficiently colle...

How to delete specific tags

Hi. I have the following XML file: <book> <bookname child="test"> <text> Works </text> <text> Doesn't work </text> </bookname> </book> This is just a one block, there are more than one <bookname> tags. I need to iterate through the whole document and remove specific <text> tags. How do I do that? My approach is to create an El...

error with parse funcion in lxml

Hi all! i have installed lxml2.2.2 on windows platform(i m using python version 2.6.5).i tried this simple command: from lxml.html import parse p= parse(‘http://www.google.com’).getroot() but i am getting the following error: Traceback (most recent call last): File “”, line 1, in p=parse(‘http://www.google.com’).getroot() File “C:...

How to select following sibling/xml tag using xpath

I have an HTML file (from Newegg) and their HTML is organized like below. All of the data in their specifications table is 'desc' while the titles of each section are in 'name.' Below are two examples of data from Newegg pages. <tr> <td class="name">Brand</td> <td class="desc">Intel</td> </tr> <tr> <td class="name">Series</...

Mocking urllib2.urlopen and lxml.etree.parse using pymox

I'm trying to test some python code that uses urllib2 and lxml. I've seen several blog posts and stack overflow posts where people want to test exceptions being thrown, with urllib2. I haven't seen examples testing successful calls. Am I going down the correct path? Does anyone have a suggestion for getting this to work? Here is what...

Python - Validation with multiple schemas using lxml

Hello, I'm working with a schema that was built by a third party and I'd like to validate it with lxml. The problem is that such a schema is split over different xsd files, which reference themselves. For example, a file called "extension.xsd" (which builds upon the "master" schema) has a line like: <redefine schemaLocation="master.x...

XPath and lxml syntax

Hi everyone. I have a XML file with the structure as shown below: <x> <y/> <y/> . . </x> The number of <y> tags are arbitrary. I want to get the text of the <y> tags and for this I decided to use XPath. I have figured out the syntax, say for the first y: (Assume root as x) textFirst = root.xpath('y[1]/text()') This wor...

Convert XML to python objects using lxml

I'm trying to use the lxml library to parse an XML file...what I want is to use XML as the datasource, but still maintain the normal Django-way of interactive with the resulting objects...from the docs, I can see that lxml.objectify is what I'm suppossed to use, but I don't know how to proceed after: list = objectify.parse('myfile.xml') ...

Python 3.1.2 + Snow Leopard + lxml + XMLSchema

Hi folks, I'd like to use lxml library to validate XML Schemas in Python 3.1.2. Since the Snow Leopard MAC OS comes with the Python 2.6.1 installed, firstly, I downloaded the Python 3.1.2 automated installer at http://www.python.org/ftp/python/3.1.2/python-3.1.2-macosx10.3-2010-03-24.dmg and installed it. Secondly, I downloaded lxml 2...