My XML file looks like this:
<strings>
<string>Bla <b>One & Two</b> Foo</string>
</strings>
I want to extract the content of each <string> while maintaining the inner tags. That is, I would like to see the following Python string: u"Bla <b>One & Two</b> Foo". Alternatively, I guess I could settle on u"Bla <b>One & Two</b> ...
I am using lxml to read through an xml file and change a few details. However, when running it I find that even if I just use lxml to read the file and then write it out again, as below:
fil='iTunes Music Library.XML'
tre=etree.parse(fil)
tre.write('temp.xml')
I find Queensrÿche converted to Queensrÿche. Anyone know how to fix th...
Can I use python lxml on google app engine? ( or do i have to use Beautiful Soup? )
I have started using Beautiful Soup but it seems slow. I am just starting to play with the idea of "screen scraping" data from other websites to create some sort of "mash-up".
...
I'm receiving data packets in XML format, each with a specific documentRoot tag, and I'd like to delegate specialized methods to take care of those packets, based on the root tag name. This worked with xml.dom.minidom, something like this:
dom = minidom.parseString(the_data)
root = dom.documentElement
deleg = getattr(self,'elem_' + str(...
I'm cleaning up some gross XML, and so I've had pretty_print = True set in the call to etree.tostring() on my lxml output of the XSL transform. However, that left me with a few junk whitespace nodes from the original input, so I added
<xsl:strip-space elements="*"/>
...but that completely collapses all whitespace, ignoring pretty prin...
From what I can make out, the two main HTML parsing libraries in Python are lxml and BeautifulSoup. I've chosen BeautifulSoup for a project I'm working on, but I chose it for no particular reason other than finding the syntax a bit easier to learn and understand. But I see a lot of people seem to favour lxml and I've heard that lxml is f...
I am using the Python ElementTree module to manipulate HTML.
I want to emphasize certain words, and my current solution is:
for e in tree.getiterator():
for attr in 'text', 'tail':
words = (getattr(e, attr) or '').split()
change = False
for i, word in enumerate(words):
word = clean_word.sub('', wo...
Is there an equivalent of Beautiful Soup's tag.renderContents() method in lxml?
I've tried using element.text, but that doesn't render child tags, as well as ''.join(etree.tostring(child) for child in element), but that doesn't render child text. The closest I've been able to find is etree.tostring(element), but that renders the opening...
Please help me to resolve my problem with lxml
(I'm a newbie to lxml).
How can I get "Comment 1" from the next file:
<?xml version="1.0" encoding="windows-1251" standalone="yes" ?>
<!--Comment 1-->
<a>
<!--Comment 2-->
</a>
...
I've got this xpath query:
/html/body//tbody/tr[*]/td[*]/a[@title]/@href
It extracts all the links with the title attribute - and gives the href in FireFox's Xpath checker add-on.
However, I cannot seem to use it with lxml.
from lxml import etree
parsedPage = etree.HTML(page) # Create parse tree from valid page.
hyperlinks = parsedP...
I have been using lxml to create the xml of rss feed. But I am having trouble with the tags and cant really figure out how to to add a dynamic number of elements. Given that lxml seems to just have functions as parameters of functions, I cant seem to figure out how to loop for a dynamic number of items without remaking the entire pa...
<example>
<login>
<id>1</id>
<username>kites</username>
<password>kites</password>
</login>
</example>
How can i update password using lxml?
and now can i add one more record to the same file?
please provide me a sample code
...
Hi, I'm using an xml file to store configurations for a software.
One of theese configurations would be a system path like
> set_value = "c:\\test\\3 tests\\test"
i can store it by using:
> setting = etree.SubElement(settings,
> "setting", name=tmp_set_name, type =
> set_type , value= set_value)
If I use
doc.write(output_file, m...
Is it possible get all context nodes used to evalute xpath result ?
In below code:
test_xml = """
<r>
<a/>
<a>
<b/>
</a>
<a>
<b/>
</a>
</r>
"""
test_root = lxml.etree.fromstring(test_xml)
res = test_root.xpath("//following-sibling::*[1]/b")
for node in res:
print test_root.getroottree().getpath(no...
Using lxml.objectify like so:
from lxml import objectify
o = objectify.fromstring("<a><b atr='someatr'>oldtext</b></a>")
o.b = 'newtext'
results in <a><b>newtext</b></a>, losing the node attribute. It seems to be directly replacing the element with a newly created one, rather than simply replacing the text of the element.
If I try ...
The lxml package for Python seems to absolutely broken on my system. I am not sure of the problem, as all of the files are in place, it seems. My suspicion is that the problem is in __init__.py, but I don't have enough practice with the system to make an accurate diagnosis or fix the problem.
Here is some code that I think will help dia...
Generally I use lxml for my HTML parsing needs, but that isn't available on Google App Engine. The obvious alternative is BeautifulSoup, but I find it chokes too easily on malformed HTML. Currently I am testing libxml2dom and have been getting better results.
Which pure Python HTML parser have you found performs best? My priority is th...
hello friends,
<login>
<user>
<userid>admin</userid>
</user>
.
.
.
.
<user>
<userid>admin</userid>
</user>
</login>
this my xml file.
when i user clear()or del method it will clear all the child and a blank node is creating
<user/>
How can i avoid creating this blank node
it will make problem when i use...
I have a snippet of HTML that contains paragraphs. (I mean p tags.) I want to split the string into the different paragraphs. For instance:
'''
<p class="my_class">Hello!</p>
<p>What's up?</p>
<p style="whatever: whatever;">Goodbye!</p>
'''
Should become:
['<p class="my_class">Hello!</p>',
'<p>What's up?</p>'
'<p style="whatever: w...
Hi,
I'm trying to parse a huge xml file with lxml in a memory efficient manner (ie streaming lazily from disk instead of loading the whole file in memory). Unfortunately, the file contains some bad ascii characters that break the default parser. The parser works if I set recover=True, but the iterparse method doesn't take the recover ...