Still learning lxml. I discovered that sometimes I cannot get to the text of an item from a tree using item.text. If I use item.text_content() I am good to go. I am not sure I see why yet. Any hints would be appreciated
Okay I am not sure exactly how to provide an example without making you handle a file:
here is some code I wrote ...
I'm parsing a non-compliant xml file (Sphinx's[1] xmlpipe2 format) and would like lxml parser to ignore the fact that there are unresolved namespace prefixes.
An example of the Sphinx XML:
<sphinx:schema>
<sphinx:field name="subject"/>
<sphinx:field name="content"/>
<sphinx:attr name="published" type="timestamp"/...
Is there a way to get multiple tag names from lxml's lxml.etree.iterparse? I have a file-like object with an expensive read operation and many tags, so getting all tags or doing two passes is suboptimal.
Edit: It would be something like Beautiful Soup's find(['tag-1', 'tag-2]), except as an argument to iterparse. Imagine parsing an HTML...
Hello
I'm use lxml to parsing big table and now have trouble:
>>> winvps[0].getnext().xpath("descendant::*")
118: [<Element td at 3a30180>,
<Element a at 3a301b0>,
<Element font at 3a301e0>,
<Element b at 3a30210>,
<Element td at 3a30240>,
<Element td at 3a30270>,
<Element font at 3a302a0>,
<Element td at 3a302d0>,
<Element td ...
I have been having fun manipulating html with lxml. Now I want to do some manipulation of the actual file, after finding a particular element that meets my needs I want to know if it is possible to retrieve the source of the element.
I jumped up and down in my chair after seeing sourceline as a method of my element but that did not giv...
I hope I asked that correctly. I am trying to figure out what element.sourceline does and if there is some way I can use its features. I have tried building my elements from the html a number of ways but every time I iterate through my elements and ask for sourceline I always get None. When I tried to use the built-in help I done't ge...
Codespeak.net is down and something, somewhere in my buildout wants to easy_install lxml from it, despite me boopstrapping with pip, having it installed already and removing it from my buildout files.
How else can I get round this?
...
I'm using jQuery to load arbitrary XML strings (fragments of a larger document) into the browser DOM and manipulate them, then using XMLSerializer to load them back to strings and send them back to the server, where they are processed (by python and lxml) and re-integrated into a full XML document.
The XML starts and ends in a git repos...
I need help parsing out some text from a page with lxml. I tried beautifulsoup and the html of the page I am parsing is so broken, it wouldn't work. So I have moved on to lxml, but the docs are a little confusing and I was hoping someone here could help me.
Here is the page I am trying to parse: http://bit.ly/bf1T12. I need to get ...
I have a few different XML documents that I'm trying to combine into one using lxml. The problem is that I need the result to preserve the namespaces on each of the sub-documents' root nodes. Lxml seems to want to push any namespace declarations used more than once to the root of the new document, which breaks in my application (it is ...
With lxml.html, how do I access single elements without using a for loop?
This is the HTML:
<tr class="headlineRow">
<td>
<span class="headline">This is some awesome text</span>
</td>
</tr>
For example, this will fail with IndexError:
for row in doc.cssselect('tr.headlineRow'):
headline = row.cssselect('td span.headlin...
I try to parse a secondary page with form . I use example code source from this link :
http://blog.ianbicking.org/2007/09/24/lxmlhtml/
On my test i use this url: http://www.infofer.ro/
Like on example , I use this values :
>>> pprint(form.form_values())
[('cboData', '8/30/2010'),
('txtPlecare', 'Bucuresti Nord'),
('txtSosire', 'Const...
Hi,
Is anyone find the class for LXML in PHP. I have no idea about python.
If anyone find the class or library or tutorials, please share with me
Thanks,
Nithish
...
The code
from lxml import etree
produces the error
ImportError: No module named lxml
Running
sudo easy_install lxml
results in
lxml 2.2.7 is already the active version in easy-install.pth
Removing lxml-2.2.7-py2.5-macosx-10.3-i386.egg from site-packages and rerunning sudo easy_install lxml results in
Adding lxml 2.2.7 to ea...
hi i have xml file whitch i want to parse, it looks something like this
<?xml version="1.0" encoding="utf-8"?>
<SHOP xmlns="http://www.w3.org/1999/xhtml" xmlns:php="http://php.net/xsl">
<SHOPITEM>
<ID>2332</ID>
...
</SHOPITEM>
<SHOPITEM>
<ID>4433</ID>
...
</SHOPITEM>
</SHOP>
my parsin...
I am working with a large set of html documents. One of my tasks is to extract all text from the documents. I have gotten pretty far but now I am stumped because of the use of tables as containers / formatting structures for information that is not numeric in nature
My goal is to ignore - leave behind - not extract the 'table' if it i...
I am trying to remove comments from a list of elements that were obtained by using lxml
The best I have been able to do is:
no_comments=[element for element in element_list if 'HtmlComment' not in str(type(each))]
I am wondering if there is a more direct way?
I am going to add something based on Matthew's answer - he got me almost t...
I am new to python/lxml After reading the lxml site and dive into python I could not find the solution to my n00b troubles. I have the below xml sample:
---------------
<addressbook>
<person>
<name>Eric Idle</name>
<phone type='fix'>999-999-999</phone>
<phone type='mobile'>555-555-555</phone>
<address...
Suppose I have this sort of HTML from which I need to select "text2" using lxml / ElementTree:
<div>text1<span>childtext1</span>text2<span>childtext2</span>text3</div>
If I already have the div element as mydiv, then mydiv.text returns just "text1".
Using itertext() seems problematic or cumbersome at best since it walks the entire tr...
I'm trying to create XML Schema using lxml. For the begining something like this:
<xs:schema xmlns="http://www.goo.com" xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" targetNamespace="http://www.goo.com">
<xs:element type="xs:string" name="name"/>
<xs:element type="xs:positiveInteger" name="age...