parsing

"Smart" way of parsing and using website data?

How does one intelligently parse data returned by search results on a page? For example, lets say that I would like to create a web service that searches for online books by parsing the search results of many book providers' websites. I could get the raw HTML data of the page, and do some regexs to make the data work for my web service,...

PHP explode string only if line does not end with '!'

Hi all! I'm writing a parser for a scripting programming language in PHP. The syntax of that scripting language looks like this: ZOMFG &This is a comment (show "Hello, World\!"); This is a page written in that language, that displays Hello, World! in the browser. But I could also have code like this: ZOMFG &This is a comment ! on mu...

Most jQuery-like HTML parser for Ruby

What HTML parser for Ruby will I find easiest to use if I'm already familiar / in love with jQuery? Such a parser would have jQuery's overall philosophy -- "grab some HTML elements (using CSS selectors) and do things with them" -- and in addition have equivalents for all of jQuery's DOM manipulation functionality (prepend(), after(), et...

PHP Array Parsing

Hey all, I have a huge array coming back as search results and I want to do the following: Walk through the array and for each record with the same "spubid" add the following keys/vals: "sfirst, smi, slast" to the parent array member in this case, $a[0]. So the result would be leave $a[0] in tact but add to it, the values from sfirst, ...

Haskell equivalent of Python's "Construct"

Construct is a DSL implemented in Python used to describe data structures (binary and textual). Once you have the data structure described, construct can parse and build it for you. Which is good ("DRY", "Declarative", "Denotational-Semantics"...) Usage example: # code from construct.formats.graphics.png itxt_info = Struct("itxt_info",...

How do I TryParse in SQL 2000?

I have a stored procedure in an old SQL 2000 database that takes a comment column that is formatted as a varchar and exports it out as a money object. At the time this table structure was setup, it was assumed this would be the only data going into this field. The current procedure functions simply this this: SELECT CAST(dbo.member_cate...

Stripping Uppercase Words in Excel VBA

Stripping Uppercase Words in Excel VBA I have an Excel sheet like this one: A B 1 Used CONTENT VERSION SYSTEM for the FALCON Project 2 USA beats UK at Soccer Cup 2008 3 DARPA NET’s biggest contribution was the internet 4 One big problem is STRUCTURED QUERY LANGUAGE queries on non-normalized data I ...

Elegant way to reverse column order

Hi, I have a file named ip-list with two columns: IP1 <TAB> Server1 IP2 <TAB> Server2 And I want to produce: Server1 <TAB> IP1 Server2 <TAB> IP2 What's the most elegant, shortest Linux command line tool to do it? ...

XML Parsing in Python using document builder factory

Hi, I am working in STAF and STAX. Here python is used for coding . I am new to python. Basically my task is to parse a XML file in python using Document Factory Parser. The XML file I am trying to parse is : <?xml version="1.0" encoding="utf-8"?> <operating_system> <unix_80sp1> <tests type="quick_sanity_test"> <prerequisi...

How to parse an .as (AS3) file

I am looking to get as close as I can to parsing out an AS3 file into objects or XML. For instance, imagine the following class: package { class SomeClass extends AnotherClass { private var someVariable:Number public function someMethod(someParameter:Number = 4):void { var someLocalVariable:Number = someParamet...

Best way to parse a table in Ruby

I'd like to parse a simple table into a Ruby data structure. The table looks like this: http://img232.imageshack.us/img232/446/picture5cls.png Edit: Here is the HTML and I'd like to parse it into an array of hashes. E.g.,: schedule[0]['NEW HAVEN'] == '4:12AM' schedule[0]['Travel Time In Minutes'] == '95' Any thoughts on how to do ...

Are there good Patterns/Idioms for Data Translation/Transformation?

Hi, I'm sorry for the generic title of this question but I wish I was able to articulate it less generically. :-} I'd like to write a piece of software (in this case, using C++) which translates a stream of input tokens into a stream of output tokens. There are just five input tokens (lets call them 0, 1, 2, 3, 4) and each of them can ...

Two charset tags on a page, which to take?

I'm working on crawling pages for information, and have run into many problems with parsing the pages in Groovy. I've made semi-solution that works most of the time using juniversal chardet and just scanning the page for tag in the head, but sometimes two of these tags are found on one page, for example: <meta http-equiv="Content-Ty...

ColdFusion code parser?

I'm trying to create an app to search my company's ColdFusion codebase. I'd like to be able to do intelligent searches, for example: find where a function is defined (and not hit everywhere the function is called). In order to do this, I'd need to parse the ColdFusion code to identify things like function declarations, function calls, ...

How can I split out individual column values from each line in a text file?

I have lines in an ASCII text file that I need to parse. The columns are separated by a variable number of spaces, for instance: column1 column2 column3 How would i split this line to return an array of only the values? thanks ...

How to get entire input string in Lex and Yacc?

OK, so here is the deal. In my language I have some commands, say XYZ 3 5 GGB 8 9 HDH 8783 33 And in my Lex file XYZ { return XYZ; } GGB { return GGB; } HDH { return HDH; } [0-9]+ { yylval.ival = atoi(yytext); return NUMBER; } \n { return EOL; } In my yacc file start : commands ; commands : command | command EOL co...

Cant get description rss tag data with javascript

I'm currently making a widget to take and display items from a feed. I have this working for the most part, but for some reason the data within the tag within the item comes back as empty, but I get the data in the and tags no problem. feed is and xmlhttp.responseXML object. var items = feed.getElementsByTagName("item"); for (var i...

next line character a huge influence on xmlparser?

I have question about a basic xml file I'm parsing and just putting in simple nextlines(Enters). I'll try to explain my problem with this next example. I'm( still) building an xml tree and all it has to do ( this is a testtree ) is put the summary in an itemlist. I then export it to a plist so I can see if everything is done correctly....

How to extract mail atachment with PHP?

I'm extracting emails from a database where they're stored as strings. I need to parse these emails to extract their attachments. I guess there must already be some library to do this easily but I can't find any. ...

Filtering XML while preserving its structure

I'd like to remove certain tags from an XML document as part of a filtering process but I cannot otherwise modify the appearance or structure of the XML. The input XML comes in as a string eg: <?xml version="1.0" encoding="UTF-8"?> <main> <mytag myattr="123"/> <mytag myattr="456"/> </main> and the output needs to remove mytag...