How does one intelligently parse data returned by search results on a page?
For example, lets say that I would like to create a web service that searches for online books by parsing the search results of many book providers' websites. I could get the raw HTML data of the page, and do some regexs to make the data work for my web service,...
Hi all!
I'm writing a parser for a scripting programming language in PHP. The syntax of that scripting language looks like this:
ZOMFG
&This is a comment
(show "Hello, World\!");
This is a page written in that language, that displays Hello, World! in the browser. But I could also have code like this:
ZOMFG
&This is a comment !
on mu...
What HTML parser for Ruby will I find easiest to use if I'm already familiar / in love with jQuery?
Such a parser would have jQuery's overall philosophy -- "grab some HTML elements (using CSS selectors) and do things with them" -- and in addition have equivalents for all of jQuery's DOM manipulation functionality (prepend(), after(), et...
Hey all, I have a huge array coming back as search results and I want to do the following:
Walk through the array and for each record with the same "spubid" add the following keys/vals: "sfirst, smi, slast" to the parent array member in this case, $a[0]. So the result would be leave $a[0] in tact but add to it, the values from sfirst, ...
Construct is a DSL implemented in Python used to describe data structures (binary and textual). Once you have the data structure described, construct can parse and build it for you. Which is good ("DRY", "Declarative", "Denotational-Semantics"...)
Usage example:
# code from construct.formats.graphics.png
itxt_info = Struct("itxt_info",...
I have a stored procedure in an old SQL 2000 database that takes a comment column that is formatted as a varchar and exports it out as a money object. At the time this table structure was setup, it was assumed this would be the only data going into this field. The current procedure functions simply this this:
SELECT CAST(dbo.member_cate...
Stripping Uppercase Words in Excel VBA
I have an Excel sheet like this one:
A B
1 Used CONTENT VERSION SYSTEM for the FALCON Project
2 USA beats UK at Soccer Cup 2008
3 DARPA NET’s biggest contribution was the internet
4 One big problem is STRUCTURED QUERY LANGUAGE queries on non-normalized data
I ...
Hi,
I have a file named ip-list with two columns:
IP1 <TAB> Server1
IP2 <TAB> Server2
And I want to produce:
Server1 <TAB> IP1
Server2 <TAB> IP2
What's the most elegant, shortest Linux command line tool to do it?
...
Hi,
I am working in STAF and STAX. Here python is used for coding . I am new to python.
Basically my task is to parse a XML file in python using Document Factory Parser.
The XML file I am trying to parse is :
<?xml version="1.0" encoding="utf-8"?>
<operating_system>
<unix_80sp1>
<tests type="quick_sanity_test">
<prerequisi...
I am looking to get as close as I can to parsing out an AS3 file into objects or XML. For instance, imagine the following class:
package {
class SomeClass extends AnotherClass {
private var someVariable:Number
public function someMethod(someParameter:Number = 4):void {
var someLocalVariable:Number = someParamet...
I'd like to parse a simple table into a Ruby data structure. The table looks like this:
http://img232.imageshack.us/img232/446/picture5cls.png
Edit: Here is the HTML
and I'd like to parse it into an array of hashes. E.g.,:
schedule[0]['NEW HAVEN'] == '4:12AM'
schedule[0]['Travel Time In Minutes'] == '95'
Any thoughts on how to do ...
Hi,
I'm sorry for the generic title of this question but I wish I was able to articulate it less generically. :-}
I'd like to write a piece of software (in this case, using C++) which translates a stream of input tokens into a stream of output tokens. There are just five input tokens (lets call them 0, 1, 2, 3, 4) and each of them can ...
I'm working on crawling pages for information, and have run into many problems with parsing the pages in Groovy. I've made semi-solution that works most of the time using juniversal chardet and just scanning the page for tag in the head, but sometimes two of these tags are found on one page, for example:
<meta http-equiv="Content-Ty...
I'm trying to create an app to search my company's ColdFusion codebase. I'd like to be able to do intelligent searches, for example: find where a function is defined (and not hit everywhere the function is called). In order to do this, I'd need to parse the ColdFusion code to identify things like function declarations, function calls, ...
I have lines in an ASCII text file that I need to parse.
The columns are separated by a variable number of spaces, for instance:
column1 column2 column3
How would i split this line to return an array of only the values?
thanks
...
OK, so here is the deal.
In my language I have some commands, say
XYZ 3 5
GGB 8 9
HDH 8783 33
And in my Lex file
XYZ { return XYZ; }
GGB { return GGB; }
HDH { return HDH; }
[0-9]+ { yylval.ival = atoi(yytext); return NUMBER; }
\n { return EOL; }
In my yacc file
start : commands
;
commands : command
| command EOL co...
I'm currently making a widget to take and display items from a feed. I have this working for the most part, but for some reason the data within the tag within the item comes back as empty, but I get the data in the and tags no problem.
feed is and xmlhttp.responseXML object.
var items = feed.getElementsByTagName("item");
for (var i...
I have question about a basic xml file I'm parsing and just putting in simple nextlines(Enters).
I'll try to explain my problem with this next example.
I'm( still) building an xml tree and all it has to do ( this is a testtree ) is put the summary in an itemlist. I then export it to a plist so I can see if everything is done correctly....
I'm extracting emails from a database where they're stored as strings. I need to parse these emails to extract their attachments. I guess there must already be some library to do this easily but I can't find any.
...
I'd like to remove certain tags from an XML document as part of a filtering process but I cannot otherwise modify the appearance or structure of the XML.
The input XML comes in as a string eg:
<?xml version="1.0" encoding="UTF-8"?>
<main>
<mytag myattr="123"/>
<mytag myattr="456"/>
</main>
and the output needs to remove mytag...