I'm trying to create an xml entry that looks like this using python and lxml:
<resource href="Unit 4.html" adlcp:scormtype="sco">
I'm using python and lxml. I'm having trouble with the adlcp:scormtype attribute. I'm new to xml so please correct me if I'm wrong. adlcp is a namespace and scormtype is an attribute that is defined in t...
I'm converting some html parsing code from BeautifulSoup to lxml. I'm trying to figure out the lxml equivalent syntax for the following BeautifullSoup statement:
soup.find('a', {'class': ['current zzt', 'zzt']})
Basically I want to find all of the "a" tags in the document that have a class attribute of either "current zzt" or "zzt". ...
I'm trying to make a web scraper that will parse a web-page of publications and extract the authors. The skeletal structure of the web-page is the following:
<html>
<body>
<div id="container">
<div id="contents">
<table>
<tbody>
<tr>
<td class="author">####I want whatever is located here ###</td>
</tr>
</tbody>
</table>
</div>
</div>
</...
I am trying to learn lxml after having used BeautifulSoup. However, I am not a strong programmer in general.
I have the following code in some source html:
<p style="font-family:times;text-align:justify"><font size="2"><b><i> The reasons to eat pickles include: </i></b></font></p>
Because the text is bolded, I want to pull that tex...
I try to:
easy_install lxml
and I get this error:
File "build/bdist.macosx-10.3-fat/egg/setuptools/command/build_ext.py", line 85, in get_ext_filename
KeyError: 'etree'
any hints?
...
If I'm parsing an XML document using lxml, is it possible to view a text representation of an element?
I tried to do :
print repr(node)
but this outputs
<Element obj at b743c0>
What can I use to see the node like it exists in the XML file? Is there some to_xml method or something?
...
I'm looking for the Clojure/Java equivalent to Python's lxml library.
I've used it a ton in the past for parsing all sorts of html (as a replacement for BeautifulSoup) and it's great to be able to use the same elementtree api for xml as well -- really a trusted friend! Can anyone recommend a similar Java/Clojure library?
About lxml
...
Hello, I'm searching in a HTML document using XPath from lxml in python. How can I get the path to a certain element? Here's the example from ruby nokogiri:
page.xpath('//text()').each do |textnode|
path = textnode.path
puts path
end
print for example '/html/body/div/div[1]/div[1]/p/text()[1]' and this is the string I want to ...
Hello,
I 'am new to lxml, quite new to python and could not find a solution to the following:
I need to import a few tables with 3 columns and an undefined number of rows starting at row 3.
When the second column of any row is empty, this row is discarded and the processing of the table is aborted.
The following code prints the table...
Hello. Consider the following snippet:
import lxml.html
html = '<div><br />Hello text</div>'
doc = lxml.html.fromstring(html)
text = doc.xpath('//text()')[0]
print lxml.html.tostring(text.getparent())
#prints <br>Hello text
I was expecting to see '<div><br />Hello text</div>', because br can't have nested text and is "self-closed" (I...
hello.
i want to extract some text in certain website.
here is web address what i want to extract some text to make scraper.
http://news.search.naver.com/search.naver?sm=tab%5Fhty&where=news&query=times&x=0&y=0
in this page, i want to extract some text with subject and content field separately.
for example,if you open tha...
Hello,
im making web scraper now.
i was received many help from here Stackoverflow.
now almost finished my scraper except some related with serveral problem :)
i was uploaded my script source to http://elca.pastebin.com/m52e7d8e0
current problem is , if you see my script source line 74,
you can see this line "thepage = urllib.urlopen(the...
At exam.com is not about the weather:
Tokyo: 25°C
I want to use Django 1.1 and lxml to get information at the website. I want to get information that is of "25" only.
HTML exam.com structure as follows:
<p id="resultWeather">
<b>Weather</b>
Tokyo:
<b>25</b>°C
</p>
I'm a student. I'm doing a small project with my friend...
Example, if I have
<form name="blah">
<input name="1"/>
<input name="2"/>
<table>
<tr>
<td>
<unkown number of levels more>
<input name="3"/>
</td>
</tr>
<table>
</form>
How can I put together a query that will return input 1,2 and 3?
Edit: I should note I'm not interested i...
Currently I have 2 varieties, LXML and libXML2 that both seem to work. I have tried benchmarking both, specifically for parsing memory string and files into XML and importing XSLT stylesheets and applying them. While pure performance based tests indicate that LXML comes on top (applying stylesheets specifically) libxml2 seems to have bee...
Hello, I'm using lxml to parse a HTML file and I'd like to know how can I set the context of xpath search. What I mean I that I have a node element and want to make xpath search only inside this node as if it was the root one. For example, I have a form node and xpath search //input return only inputs of the given form as opposed to all ...
I'm trying to scrape META keywords and description tags from arbitrary websites. I obviusly have no control over said website, so have to take what I'm given. They have a variety of casings for the tag and attributes, which means I need to work case-insensitively. I can't believe that the lxml authors are as stubborn as to insist on full...
I'm starting to use lxml in Python for processing XML/XSL documents, and in general it seems very straight forward. However, I'm not able to find a way to pass an XML fragment as a stylesheet parameter when doing a translation.
For example, in PHP it is possible to pass DOMDocument XML fragments as stylesheet parameters, so that one can...
I have an application where I've been using html5lib to liberally parse html. I use the minidom interface, because I need a real DOM API and ElementTree is not appropriate for what I'm doing.
Here's how I do this:
parser = html5lib.XHTMLParser(tree=html5lib.treebuilders.getTreeBuilder('dom'))
parser.parse(html)
However, parsing huge ...
Hello,
I'm fairly new to lxml and HTML Parsers as a whole.
I was wondering if there is a way to replace an element within a tree with another element...
For example I have:
body = """<code> def function(arg): print arg </code> Blah blah blah <code> int main() { return 0; } </code> """
doc = lxml.html.fromstring(body)
codeblocks = do...