Suppose I have the following HTML:
html = Four score and seven <b>years ago</b>
I want to parse this with Hpricot:
doc = Hpricot(html)
Find the <b> node:
node = doc.at('b')
and then get the character index of the <b> node within its parent:
node.character_index
=> 22
How can I do this (i.e., what's the real version of the cha...
I have to go through the following text and match each of the following, and break them apart into separate records to save to a database. So this text:
ESTIMATED MINIMUM CENTRAL PRESSURE 951 MB
EYE DIAMETER 12 NM
MAX SUSTAINED WINDS 105 KT WITH GUSTS TO 130 KT
64 KT....... 25NE 25SE 25SW 25NW
50 KT....... 60NE 30SE 30SW 60NW
34...
I'm working on an application which stores data in tables, similar to an RDBMS. I'm looking for a way to let my users query this data using SQL. Ideally, I'd like to do this without having to implement my own SQL parser, query optimizer, etc. So far, ripping parts out of something like Apache Derby is looking like the best option, but...
Hi I would like to know a good HTML parser for both static and dynamic HTML in java. It needs to be light weight as it is to be used on a mobile application . Is there anything already present?
...
I wrote an application which makes use of a meta-parser generated using CSharpCC (a port of JavaCC). Everything works fine and very good I can say.
For the nature of the project, I would like to have more flexibility on the possibility to extend the syntax of the meta-language used by the application.
Do you know any existing libraries ...
This is a fairly trivial data parsing question. I'm just unclear on the methods I should be using to pull it off.
I've got a plain text file of a few hundred lines. Each line is of exactly the same format. The lines are in contiguous chunks where the first item in a line is essentially a key that is repeated for each line in a chunk:
k...
i want to parse the header of the bounced email. i have tried to use some open source email parser but did not find any email parser that can parse email header
thanx
...
My problem is one like a simulated problem from
http://my.safaribooksonline.com/0596007973/pythoncook2-CHP-10-SECT-17
which eventually made its way into Python Cookbook, 2nd Edition using an outdated xpath method from 2005 that I haven't been able to get to work with 10.6's build-in python(nor installing older packages)
I want to ......
I want to create a SQL interface on top of a non-relational data store. Non-relational data store, but it makes sense to access the data in a relational manner.
I am looking into using ANTLR to produce an AST that represents the SQL as a relational algebra expression. Then return data by evaluating/walking the tree.
I have never implem...
How do I scrape html tables using the XML package?
Take, for example, this wikipedia page on the Brazilian soccer team. I would like to read it in R and get the "list of all matches Brazil have played against FIFA recognised teams" table as a data.frame. How can I do this?
...
In a book chapter about compilers, there's the following grammar definition and example code.
...
statement: whileStatement
| ifStatement
| ... // Other statement possibilities
| '{' statementSequence '}'
whileStatement: 'while' '(' expression ')' statement
ifStatement: ... // Definition of "if"
statemen...
I have an application where I need to parse or tokenize XML and preserve the raw text (e.g. don't parse entities, don't convert whitespace in attributes, keep attribute order, etc.) in a Java program.
I've spent several hours today trying to use StAX, SAX, XSLT, TagSoup, etc. before realizing that none of them do this. I can't afford to...
Hi,
Assuming the text is typed at the same time in the same (Israeli) timezone, The following free text lines are equivalent:
Wed Sep 9 16:26:57 IDT 2009
2009-09-09 16:26:57
16:26:57
September 9th, 16:26:57
Is there a python module that would convert all these text-dates to an (identical) datetime.datetime instance?
I would like to...
I am looking for a good example of how to overload the stream input operator (operator>>) to parse some data with simple text formatting. I have read this tutorial but I would like to do something a bit more advanced. In my case I have fixed strings that I would like to check for (and ignore). Supposing the 2D point format from the link ...
im doing parsing and the kind of text that i want to match and then make it null is as follows :-
<tr class="label-BGC"><td colspan="4">any kind of text here</td></tr>
i want to match every line that contains "<tr class="label-BGC"><td colspan="4">any text</td></tr>"
its evening here and my brain-battery is totally down
what im try...
I have a string that contains a known number of double values. What's the cleanest way (via C#) to parse the string and plug the results into matching scalar variables. Basically, I want to do the equivalent of this sscanf statement, but in C#:
sscanf( textBuff, "%lg %lg %lg %lg %lg %lg", &X, &Y, &Z, &I, &J, &K );
... assuming that ...
I'm trying to find a grammar file for Javascript for Yacc (preferably for Jay, but since Jay is a Yacc clone I should be fine, since I need to implement it on .NET).
...
I would like to parse an assembly qualified name in .NET 3.5. In particular, the assembly itself is not available, it's just the name. I can think of many ways of doing it by hand but I guess I might be missing some feature to do that in the system libraries. Any suggestion?
...
Hey guys,
I am working on a Natural Language parser which examines a sentence in english and extracts some information like name, date etc.
for example: "Lets meet next tuesday at 5 PM at the beach."
So the output will be something like : "Lets meet 15/09/2009 at 1700 hr at the beach"
So basically, what i want to know is that is ther...
I know and use bison/yacc. But in parsing world, there's a lot of buzz around packrat parsing.
What is it? Is it worth studing?
...