parsing

TCL TDom: Looping through Objects

Using TDom, I would like to cycle through a list of objects in the following format: <object> <type>Hardware</type> <name>System Name</name> <description>Basic Description of System.</description> <attributes> <vendor>Dell</vendor> <contract>MM/DD/YY</contract> <supportExpiration...

Text extraction with java html parsers

I want to use an html parser that does the following in a nice, elegant way Extract text (this is most important) Extract links, meta keywords Reconstruct original doc (optional but nice feature to have) From my investigation so far jericho seems to fit. Any other open source libraries you guys would recommend? ...

How can I parse a URL using JSP/JSTL?

I want to capture the base path of a given url http://mydomain.com/mypath/coolpath/favoritepath/file.htm so basically i just want this: /mypath/coolpath/favoritepath/ Any ideas how to do this without using straight Java? ...

XML DOM parsing with Java

I'm trying to parse this XML string: <?xml version="1.0" encoding="UTF-8" standalone="no"?> <response type="success"> <lots> <lot>32342</lot> <lot>52644</lot> </lots> </response> When I get the root node, which is "response", I use the method getChildNodes() which returns a NodeList of length 3. However what I'm con...

Performance of tokenizing CSS in PHP

This is a noob question from someone who hasn't written a parser/lexer ever before. I'm writing a tokenizer/parser for CSS in PHP (please don't repeat with 'OMG, why in PHP?'). The syntax is written down by the W3C neatly here (CSS2.1) and here (CSS3, draft). It's a list of 21 possible tokens, that all (but two) cannot be represented a...

Boolean logic parser for SQL

This is going to sound crazy but does anyone have techniques that would allow me to parse boolean logic strings in Sql Server 2005 without extraordinary/rediculous effort? Here is an example: (SOMEVAR=4 OR SOMEVAR=5) AND (NOT OTHERVAR=Y) I feel like recursion would help a lot if that were possible in Sql but I'm not really sure how to ...

Combine two numbers into one. Example: 123 and 456 become 123456

In C++, how do I combine (note: not add) two integers into one big integer? For example: int1 = 123; int2 = 456; Is there a function to take the two numbers and turn intCombined into 123456? EDIT: My bad for not explaining clearly. If int2 is 0, then the answer should be 123, not 1230. In actuality though, int1 (the number on the...

Parsing Twitter API Datestamp

I'm using the twitter API to return a list of status updates and the times they were created. It's returning the creation date in the following format: Fri Apr 09 12:53:54 +0000 2010 What's the simplest way (with PHP or Javascript) to format this like 09-04-2010? ...

Scrape HTML tables from a given URL into CSV

I seek a tool that can be run on the command line like so: tablescrape 'http://someURL.foo.com' [n] If n is not specified and there's more than one HTML table on the page, it should summarize them (header row, total number of rows) in a numbered list. If n is specified or if there's only one table, it should parse the table and spit i...

Extracting an attribute value with beautifulsoup

I am trying to extract the content of a single "value" attribute in a specific "input" tag on a webpage. I use the following code: import urllib f = urllib.urlopen("http://58.68.130.147") s = f.read() f.close() from BeautifulSoup import BeautifulStoneSoup soup = BeautifulStoneSoup(s) inputTag = soup.findAll(attrs={"name" : "stainfo"})...

Best XML Parser for RSS Feeds in Objective C ?

Hi all, I am going to develop an application which will parse the RSS feeds and display the items in my custom cell.(Cell containing the image, label, description, etc). The most popular way of parsing is using the NSXMLParser. But this is bit of a lengthy way. So is there any other way to do this. Or my question will be, which is the b...

How do I get data from the iTunes app store

I'm trying to scrape the entire iTunes App Store so that I can store it in a database for a project I'm working on. I'm having a hard time finding the best way to do this. I know there are ways to get specific information about price changes but I can't find anything that describes how to scrape the entire app store. Any additional inf...

Parsing WordPress XML, slash:comments syntax?

This is really just a syntax question. I have a PHP script that parses my WordPress feed and returns the latest posts. I also want my script to parse the # of comments, but the WordPress feed XML object for number of comments has a colon in it (slash:comments). It causes the following error: Parse error: syntax error, unexpected ...

how do i parse an xml page to output its data pieces to the way i want?

here is the page i want to parse (the api link i gave is just a dev test so its ok to be public) http://api.scribd.com/api?method=docs.getList&amp;api_key=2apz5npsqin3cjlbj0s6m the output im looking for is something like this (for now) Doc_id: 29638658 access_key: key-11fg37gwmer54ssq56l3 secret_password: 1trinfqri6cnv3gf6rnl titl...

Retrieving information with Python's urllib from a page that is done via __doPostBack()?

I'm trying to parse a page that has different sections that are loaded with a Javascript __doPostBack() function. An example of a link is: javascript:__doPostBack('ctl00$cphMain$ucOemSchPicker$dlSch$ctl03$btnSch','') As soon as this is clicked, the browser doesn't fetch a new URL but a section of webpage is updated to reflect new info...

Parse HTML with CSS or XPath selectors?

My goal is to parse HTML with lxml, which supports both XPath and CSS selectors. I can tie my model properties either to CSS or XPath, but I'm not sure which one would be the best, e.g. less fuss when HTML layout is changed, simpler expressions, greater extraction speed. What would you choose in such a situation? ...

Error when feeding a mysql db with a python-parsed data

I use this bit of code to feed some data i have parsed from a web page to a mysql database c=db.cursor() c.executemany( """INSERT INTO data (SID, Time, Value1, Level1, Value2, Level2, Value3, Level3, Value4, Level4, Value5, Level5, ObsDate) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)""", clean_data ) The parsed data lo...

Display part of an XML file while parsing it

Hey, Consider the following XML file : <cookbook> <recipe xml:id="MushroomSoup"> <title>Quick and Easy Mushroom Soup</title> <ingredient name="Fresh mushrooms" quantity="7" unit="pieces"/> <ingredient name="Garlic" quantity="1" unit="cloves"/> </recipe> <recipe...

Is there a standard lexer/parser tool for Python?

A volunteer job requires us to convert a large number of LaTeX documents into ePub format. It's a series of open-source fiction book which has so far only been produced only on paper via a print on demand service. We'd like to be able to offer the book to users of book-reader devices (such as Kindle) which require the ePub format for bes...

Extracting and Parsing data with soapUI

Hi. So I am in need to learn how to use soapUI pretty quick. I'm finding it pretty tedious to start so I was hoping I might be able to get some help here. Here's what I need to do. Lets say we have Company A and Company B which is a subset of Company B. Now Company A offers a webservice accessible by Company B such that Company B can ga...