parsing

Find the character index of a node within its parent node with Hpricot

Suppose I have the following HTML: html = Four score and seven <b>years ago</b> I want to parse this with Hpricot: doc = Hpricot(html) Find the <b> node: node = doc.at('b') and then get the character index of the <b> node within its parent: node.character_index => 22 How can I do this (i.e., what's the real version of the cha...

Match text in loops in Ruby

I have to go through the following text and match each of the following, and break them apart into separate records to save to a database. So this text: ESTIMATED MINIMUM CENTRAL PRESSURE 951 MB EYE DIAMETER 12 NM MAX SUSTAINED WINDS 105 KT WITH GUSTS TO 130 KT 64 KT....... 25NE 25SE 25SW 25NW 50 KT....... 60NE 30SE 30SW 60NW 34...

What's the easiest way to expose a SQL interface to my application?

I'm working on an application which stores data in tables, similar to an RDBMS. I'm looking for a way to let my users query this data using SQL. Ideally, I'd like to do this without having to implement my own SQL parser, query optimizer, etc. So far, ripping parts out of something like Apache Derby is looking like the best option, but...

Can anyone suggest a good open source HTML parser to be used with java?

Hi I would like to know a good HTML parser for both static and dynamic HTML in java. It needs to be light weight as it is to be used on a mobile application . Is there anything already present? ...

Dynamicaly extensible generic parser

I wrote an application which makes use of a meta-parser generated using CSharpCC (a port of JavaCC). Everything works fine and very good I can say. For the nature of the project, I would like to have more flexibility on the possibility to extend the syntax of the meta-language used by the application. Do you know any existing libraries ...

NSDictionary. How do I create directly from aformated file?

This is a fairly trivial data parsing question. I'm just unclear on the methods I should be using to pull it off. I've got a plain text file of a few hundred lines. Each line is of exactly the same format. The lines are in contiguous chunks where the first item in a line is essentially a key that is repeated for each line in a chunk: k...

email header parser

i want to parse the header of the bounced email. i have tried to use some open source email parser but did not find any email parser that can parse email header thanx ...

What python data structure and parser should I use with Apple's system_profiler?

My problem is one like a simulated problem from http://my.safaribooksonline.com/0596007973/pythoncook2-CHP-10-SECT-17 which eventually made its way into Python Cookbook, 2nd Edition using an outdated xpath method from 2005 that I haven't been able to get to work with 10.6's build-in python(nor installing older packages) I want to ......

Parsing SQL with Python

I want to create a SQL interface on top of a non-relational data store. Non-relational data store, but it makes sense to access the data in a relational manner. I am looking into using ANTLR to produce an AST that represents the SQL as a relational algebra expression. Then return data by evaluating/walking the tree. I have never implem...

Scraping html tables into R data frames using the XML package

How do I scrape html tables using the XML package? Take, for example, this wikipedia page on the Brazilian soccer team. I would like to read it in R and get the "list of all matches Brazil have played against FIFA recognised teams" table as a data.frame. How can I do this? ...

Optional vs. mandatory terminators in context-free grammar definition

In a book chapter about compilers, there's the following grammar definition and example code. ... statement: whileStatement | ifStatement | ... // Other statement possibilities | '{' statementSequence '}' whileStatement: 'while' '(' expression ')' statement ifStatement: ... // Definition of "if" statemen...

parsing/scanning/tokenizing "raw XML"

I have an application where I need to parse or tokenize XML and preserve the raw text (e.g. don't parse entities, don't convert whitespace in attributes, keep attribute order, etc.) in a Java program. I've spent several hours today trying to use StAX, SAX, XSLT, TagSoup, etc. before realizing that none of them do this. I can't afford to...

Python: convert free text to date

Hi, Assuming the text is typed at the same time in the same (Israeli) timezone, The following free text lines are equivalent: Wed Sep 9 16:26:57 IDT 2009 2009-09-09 16:26:57 16:26:57 September 9th, 16:26:57 Is there a python module that would convert all these text-dates to an (identical) datetime.datetime instance? I would like to...

Example of overloading C++ extraction operator >> to parse data

I am looking for a good example of how to overload the stream input operator (operator>>) to parse some data with simple text formatting. I have read this tutorial but I would like to do something a bit more advanced. In my case I have fixed strings that I would like to check for (and ignore). Supposing the 2D point format from the link ...

how to match this type of construction?

im doing parsing and the kind of text that i want to match and then make it null is as follows :- <tr class="label-BGC"><td colspan="4">any kind of text here</td></tr> i want to match every line that contains "<tr class="label-BGC"><td colspan="4">any text</td></tr>" its evening here and my brain-battery is totally down what im try...

Parse multiple doubles from string in C#

I have a string that contains a known number of double values. What's the cleanest way (via C#) to parse the string and plug the results into matching scalar variables. Basically, I want to do the equivalent of this sscanf statement, but in C#: sscanf( textBuff, "%lg %lg %lg %lg %lg %lg", &X, &Y, &Z, &I, &J, &K ); ... assuming that ...

Yacc/Jay grammar file for Javascript?

I'm trying to find a grammar file for Javascript for Yacc (preferably for Jay, but since Jay is a Yacc clone I should be fine, since I need to implement it on .NET). ...

Parsing assembly qualified name?

I would like to parse an assembly qualified name in .NET 3.5. In particular, the assembly itself is not available, it's just the name. I can think of many ways of doing it by hand but I guess I might be missing some feature to do that in the system libraries. Any suggestion? ...

Natural Language date and time parser for java

Hey guys, I am working on a Natural Language parser which examines a sentence in english and extracts some information like name, date etc. for example: "Lets meet next tuesday at 5 PM at the beach." So the output will be something like : "Lets meet 15/09/2009 at 1700 hr at the beach" So basically, what i want to know is that is ther...

What is packrat parsing?

I know and use bison/yacc. But in parsing world, there's a lot of buzz around packrat parsing. What is it? Is it worth studing? ...