lxml

Installing python2.6 and assorted libraries on DreamHost

Hi everyone, I managed to install python2.6 on DreamHost following this guide. I also tried to easy_install "lxml" but it fails horribly. Anyone ever accomplished this? TIA ...

how to remove attribute of a etree Element ?

I've Element of etree having some attributes - how can we delete the attribute of perticular etree Element. ...

python: examine XSD xml schema

Hello, I would like to examine a XSD schema in python. Currently I'm using lxml which is doing it's job very very well when it only has to validate a document against the schema. But, I want to know whats inside of the schema and access the elements in the lxml behavior. The schema: <?xml version="1.0"?> <xsd:schema xmlns:xsd="http:/...

how to find recursively for a tag of xml using LXML ?

using lxml is it possible to find recursively for tag " f1 ", i tried findall method but it works only for immediate children. I think I should go for BeautifulSoup for this !!! ...

xml to Python data structure using lxml

How can I convert xml to Python data structure using lxml? I have searched high and low but can't find anything. Input example <ApplicationPack> <name>Mozilla Firefox</name> <shortname>firefox</shortname> <description>Leading Open Source internet browser.</description> <version>3.6.3-1</version> <license name="Firefox EULA">...

How to use regular expression in lxml xpath?

I'm using construction like this: doc = parse(url).getroot() links = doc.xpath("//a[text()='some text']") But I need to select all links which have text beginning with "some text", so I'm wondering is there any way to use regexp here? Didn't find anything in lxml documentation ...

Setting timeouts to parse webpages using python lxml

I am using python lxml library to parse html pages: import lxml.html # this might run indefinitely page = lxml.html.parse('http://stackoverflow.com/') Is there any way to set timeout for parsing? ...

Close a tag with no text in lxml

I am trying to output a XML file using Python and lxml However, I notice one thing that if a tag has no text, it does not close itself. An example of this would be: root = etree.Element('document') rootTree = etree.ElementTree(root) firstChild = etree.SubElement(root, 'test') The output of this is: <document> <test/> </document I ...

Should Python 2.6 on OS X deal with multiple easy-install.pth files in $PYTHONPATH?

I am running ipython from sage and also am using some packages that aren't in sage (lxml, argparse) which are installed in my home directory. I have therefore ended up with a $PYTHONPATH of $HOME/sage/local/lib/python:$HOME/lib/python Python is reading and processing the first easy-install.pth it finds ($HOME/sage/local/lib/python...

Regular expression works normally, but fails when placed in an XML schema

I have a simple doc.xml file which contains a single root element with a Timestamp attribute: <?xml version="1.0" encoding="utf-8"?> <root Timestamp="04-21-2010 16:00:19.000" /> I'd like to validate this document against a my simple schema.xsd to make sure that the Timestamp is in the correct format: <?xml version="1.0" encoding="utf...

How do I require that an element has either one set of attributes or another in an XSD schema?

I'm working with an XML document where a tag must either have one set of attributes or another. For example, it needs to either look like <tag foo="hello" bar="kitty" /> or <tag spam="goodbye" eggs="world" /> e.g. <root> <tag foo="hello" bar="kitty" /> <tag spam="goodbye" eggs="world" /> </root> So I have an XSD schema where ...

How to get a html elements with python lxml

Hello! I have this html code: <table> <tr> <td class="test"><b><a href="">aaa</a></b></td> <td class="test">bbb</td> <td class="test">ccc</td> <td class="test"><small>ddd</small></td> </tr> <tr> <td class="test"><b><a href="">eee</a></b></td> <td class="test">fff</td> <td class="test">ggg</td> <td class="test"><small...

Creating a document tree before or after adding the subelements.

Hello everyone. I am using lxml and Python for writing XML files. I was wondering what is the accepted practice: creating a document tree first and then adding the sub elements OR adding the sub elements and creating the tree later? I know this hardly makes any difference as to the output, but I was interested in knowing what is the acc...

Write xml file using lxml library in Python

I'm using lxml to create an XML file from scratch; having a code like this: from lxml import etree root = etree.Element("root") root.set("interesting", "somewhat") child1 = etree.SubElement(root, "test") How do i write root Element object to an xml file using write() method of ElementTree class? ...

lxml unicode entity parse problems

I'm using lxml as follows to parse an exported XML file from another system: xmldoc = open(filename) etree.parse(xmldoc) But im getting: lxml.etree.XMLSyntaxError: Entity 'eacute' not defined, line 4495, column 46 Obviously it's having problems with unicode entity names - but how would i get round this? Via open() or parse...

Help with parsing lxml

Hi To implement a college project, I need to handle XML files. For this I choose lxml after doing some research. However I can't seem to find some nice tutorial to help me get started. I can't choose most specifically which type of parsing I need to use. My XML files don't have that much data but speed is main concern, not memory. Can...

Multiple XML Namespaces in tag with LXML

I am trying to use Pythons LXML library to great a GPX file that can be read by Garmin's Mapsource Product. The header on their GPX files looks like this <?xml version="1.0" encoding="UTF-8" standalone="no" ?> <gpx xmlns="http://www.topografix.com/GPX/1/1" creator="MapSource 6.15.5" version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLS...

Confused as to use a class or a function: Writing XML files using lxml and Python

Hi. I need to write XML files using lxml and Python. However, I can't figure out whether to use a class to do this or a function. The point being, this is the first time I am developing a proper software and deciding where and why to use a class still seems mysterious. I will illustrate my point. For example, consider the following f...

Getting rid of the encoding in lxml

Hi everyone. I am trying to print a XML file using lxml and Python. Here is the code: >>> from lxml import etree >>> root = etree.Element('root') >>> child = etree.SubElement(root, 'child') >>> print etree.tostring(root, pretty_print = True, xml_declaration = True, encoding = None) Output: <?xml version='1.0' encoding='ASCII'?> <r...

Which Python XML library should I use?

Hello. I am going to handle XML files for a project. I had earlier decided to use lxml but after reading the requirements, I think ElemenTree would be better for my purpose. The XML files that have to be processed are: Small in size. Typically < 10 KB. No namespaces. Simple XML structure. Given the small XML size, memory is not...