Using TDom, I would like to cycle through a list of objects in the following format:
<object>
<type>Hardware</type>
<name>System Name</name>
<description>Basic Description of System.</description>
<attributes>
<vendor>Dell</vendor>
<contract>MM/DD/YY</contract>
<supportExpiration...
I want to use an html parser that does the following in a nice, elegant way
Extract text (this is most important)
Extract links, meta keywords
Reconstruct original doc (optional but nice feature to have)
From my investigation so far jericho seems to fit. Any other open source libraries you guys would recommend?
...
I want to capture the base path of a given url
http://mydomain.com/mypath/coolpath/favoritepath/file.htm
so basically i just want this:
/mypath/coolpath/favoritepath/
Any ideas how to do this without using straight Java?
...
I'm trying to parse this XML string:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<response type="success">
<lots>
<lot>32342</lot>
<lot>52644</lot>
</lots>
</response>
When I get the root node, which is "response", I use the method getChildNodes() which returns a NodeList of length 3. However what I'm con...
This is a noob question from someone who hasn't written a parser/lexer ever before.
I'm writing a tokenizer/parser for CSS in PHP (please don't repeat with 'OMG, why in PHP?'). The syntax is written down by the W3C neatly here (CSS2.1) and here (CSS3, draft).
It's a list of 21 possible tokens, that all (but two) cannot be represented a...
This is going to sound crazy but does anyone have techniques that would allow me to parse boolean logic strings in Sql Server 2005 without extraordinary/rediculous effort?
Here is an example: (SOMEVAR=4 OR SOMEVAR=5) AND (NOT OTHERVAR=Y)
I feel like recursion would help a lot if that were possible in Sql but I'm not really sure how to ...
In C++, how do I combine (note: not add) two integers into one big integer?
For example:
int1 = 123;
int2 = 456;
Is there a function to take the two numbers and turn intCombined into 123456?
EDIT:
My bad for not explaining clearly. If int2 is 0, then the answer should be 123, not 1230. In actuality though, int1 (the number on the...
I'm using the twitter API to return a list of status updates and the times they were created. It's returning the creation date in the following format:
Fri Apr 09 12:53:54 +0000 2010
What's the simplest way (with PHP or Javascript) to format this like 09-04-2010?
...
I seek a tool that can be run on the command line like so:
tablescrape 'http://someURL.foo.com' [n]
If n is not specified and there's more than one HTML table on the page, it should summarize them (header row, total number of rows) in a numbered list.
If n is specified or if there's only one table, it should parse the table and spit i...
I am trying to extract the content of a single "value" attribute in a specific "input" tag on a webpage. I use the following code:
import urllib
f = urllib.urlopen("http://58.68.130.147")
s = f.read()
f.close()
from BeautifulSoup import BeautifulStoneSoup
soup = BeautifulStoneSoup(s)
inputTag = soup.findAll(attrs={"name" : "stainfo"})...
Hi all,
I am going to develop an application which will parse the RSS feeds and display the items in my custom cell.(Cell containing the image, label, description, etc). The most popular way of parsing is using the NSXMLParser. But this is bit of a lengthy way. So is there any other way to do this. Or my question will be, which is the b...
I'm trying to scrape the entire iTunes App Store so that I can store it in a database for a project I'm working on. I'm having a hard time finding the best way to do this. I know there are ways to get specific information about price changes but I can't find anything that describes how to scrape the entire app store.
Any additional inf...
This is really just a syntax question.
I have a PHP script that parses my WordPress feed and returns the latest posts. I also want my script to parse the # of comments, but the WordPress feed XML object for number of comments has a colon in it (slash:comments). It causes the following error:
Parse error: syntax error, unexpected
...
here is the page i want to parse
(the api link i gave is just a dev test so its ok to be public)
http://api.scribd.com/api?method=docs.getList&api_key=2apz5npsqin3cjlbj0s6m
the output im looking for is something like this (for now)
Doc_id: 29638658
access_key: key-11fg37gwmer54ssq56l3
secret_password: 1trinfqri6cnv3gf6rnl
titl...
I'm trying to parse a page that has different sections that are loaded with a Javascript __doPostBack() function.
An example of a link is: javascript:__doPostBack('ctl00$cphMain$ucOemSchPicker$dlSch$ctl03$btnSch','')
As soon as this is clicked, the browser doesn't fetch a new URL but a section of webpage is updated to reflect new info...
My goal is to parse HTML with lxml, which supports both XPath and CSS selectors.
I can tie my model properties either to CSS or XPath, but I'm not sure which one would be the best, e.g. less fuss when HTML layout is changed, simpler expressions, greater extraction speed.
What would you choose in such a situation?
...
I use this bit of code to feed some data i have parsed from a web page to a mysql database
c=db.cursor()
c.executemany(
"""INSERT INTO data (SID, Time, Value1, Level1, Value2, Level2, Value3, Level3, Value4, Level4, Value5, Level5, ObsDate)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)""",
clean_data
)
The parsed data lo...
Hey,
Consider the following XML file :
<cookbook>
<recipe xml:id="MushroomSoup">
<title>Quick and Easy Mushroom Soup</title>
<ingredient name="Fresh mushrooms"
quantity="7"
unit="pieces"/>
<ingredient name="Garlic"
quantity="1"
unit="cloves"/>
</recipe>
<recipe...
A volunteer job requires us to convert a large number of LaTeX documents into ePub format. It's a series of open-source fiction book which has so far only been produced only on paper via a print on demand service. We'd like to be able to offer the book to users of book-reader devices (such as Kindle) which require the ePub format for bes...
Hi. So I am in need to learn how to use soapUI pretty quick. I'm finding it pretty tedious to start so I was hoping I might be able to get some help here. Here's what I need to do.
Lets say we have Company A and Company B which is a subset of Company B. Now Company A offers a webservice accessible by Company B such that Company B can ga...