parsing

I'm looking for the holy grail of books on parsing.

If you all could only own or buy one book on parsing what would it be? Specifically for parsing a programming language. ...

text parsing in ruby

Hi..I am a naive programmer in ruby..just learing to write hello worlds using ruby. i need one help in parsing text in ruby given @BreakingNews: Typhoon Morakot hits Taiwan, China evacuates thousands http://news.bnonews.com/u4z3 something like this i would like to eliminate all the hyperlinks. and get plain text. :@BreakingNews: Typhoon...

Stylistic Help - jQuery regex parsing

I've written some jQuery for parsing an id out of a link's href. It works, but I'm wondering if there's a cleaner, more idiomatic jQuery way of doing it: <a class="edit_tags" href="/image/edit_tags/id/2">Edit Tags</a> <script type="text/javascript" charset="utf-8"> $('.edit_tags').click(function(event) { event.preventDefault(); ...

Convert string to int with bool/fail in C++

I have a std::string which could be a string or could be a value (such as 0) Whats the best or easiest way to convert the string to int with the ability to fail? i want a c++ version of C#'s Int32.TryParse ...

PHP: Converting from UTF-8 HTML

I have a French site that I want to parse, but am running into problems converting the (uft-8) html to latin-1. The problem is shown in the following phpunit test case: class Test extends PHPUnit_Framework_TestCase { private static function fromHTML($str){ return html_entity_decode($str, ENT_QUOTES, 'UTF-8'); } publi...

Algorithms or Patterns for reading text

My company has a client that tracks prices for products from different companies at different locations. This information goes into a database. These companies email the prices to our client each day, and of course the emails are all formatted differently. It is impossible to have any of the companies change their format - they will not...

Parse a .Net Page with Postbacks

Hello, I need to read data from an online database that's displayed using an aspx page from the UN. I've done HTML parsing before, but it was always by manipulating query-string values. In this case, the site uses asp.net postbacks. So, you click on a value in box one, then box two shows, click on a value in box 2 and click a button to ...

How can I process this text file and parse what I need?

I'm trying to parse ouput from the Python doctest module and store it in an HTML file. I've got output similar to this: ********************************************************************** File "example.py", line 16, in __main__.factorial Failed example: [factorial(n) for n in range(6)] Expected: [0, 1, 2, 6, 24, 120] Got: ...

XML documents and & char?

Hi Girls and rest, I have a question for XML document special chars, I'm using & in on of the value of the item in XML and TXMLDoc Delphi parser is complaining about it. I search for some XML parsing options but none of them concerning special chars, any ideas? Example: <Configuration> <Configuration_item> <view_...

Help parsing my own expression tree, C#

Dear all, I have the following code. I constructed an expression tree and I am stuck parsing it to find the result You will find the details within my code public enum OpertaionType { add, sub, div, mul} public class Node { public Node(Node lhs, Node rhs, OpertaionType opType, int index) { this.lhs = lhs; this....

Best practices to parse emails with Ruby

Hello Ruby/Rails/Merb developers! Im currently working on a web project that will have a feature to communicate with clients by email. So, let`s say i created account for a customer in my admin panel, then created a topic/thread to discuss questions, tasks and other work-related stuff. So, the customer will receive email notification. A...

Is there a PDF parser for PHP?

Hi I know about several PDF Generators for php (fpdf, dompdf, etc.) What I want to know is about a parser. For reasons beyond my control, certain information I need is only in a table inside a pdf and I need to extract that table and convert it to an array. Any suggestions? ...

Tools for data mining hand-written html

I need to convert a large website from static html written entirely by humans into proper relational data. First there comes a large amount of tables (not necessarily the same for every page), then code like this: <a name=pidgin><font size=4 color=maroon>Pidgin</font><br></a> <font size=2 color=teal>Author:</font><br> <font size=2>Sean ...

Request for web based resume parsing tools

Does anyone can give me some reference for web based resume parsing tools, no matter it's free or commercial? I've searched some tools for resume parsing, but most of them are desktop applications. I need a tool which is web based, or can provide web service interface that it can be integrated into web forms. Thanks a lot! ...

Regular Expression to get pascal functions

I have a pascal code file and need to parse it (using c#) and display all the public functions, my file looks something like that (not actual code): public function Test(str: string):bool; function Test1(str: string):bool; function Test2(str,str1,str2,str3 str4: string):bool; function Test3(str: string):bool; pu...

Parsing Large Text Files with PHP Without Killing the Server

I'm trying to read some large text files (between 50M-200M), doing simple text replacement (Essentially the xml I have hasn't been properly escaped in a few, regular cases). Here's a simplified version of the function: <?php function cleanFile($file1, $file2) { $input_file = fopen($file1, "r"); $output_file = fopen($file2, "w");...

Good parser generator (think lex/yacc or antlr) for .NET? Build time only?

Is there a good parser generator (think lex/yacc or antlr) for .NET? Any that have a license that would not scare lawyers? Lot’s of LGPL but I am working on embedded components and some organizations are not comfortable with me taking an LGPL dependency. I've heard that Oslo may provide this functionality but I'm not sure if it's a bui...

How to dynamically parse and compare string values in C#?

Sorry that I had to re-edit this question. I need to dynamically parse two string values to the appropriate type and compare them, then return a bool result. Example 1: string lhs = “10”; string rhs = “10”; Compare.DoesEqual(lhs, rhs, typeof(int)); //true Compare.DoesEqual(lhs, rhs, typeof(string)); //true Example 2: string lhs = ...

python datetime strptime wildcard

I want to parse dates like these into a datetime object: December 12th, 2008 January 1st, 2009 The following will work for the first date: datetime.strptime("December 12th, 2008", "%B %dth, %Y") but will fail for the second because of the suffix to the day number ('st'). So, is there an undocumented wildcard character in strptime...

fuzzy timestamp parsing with Python

hello, Is there a Python module to interpret fuzzy timestamps like the date command in unix: > date -d "2 minutes ago" Tue Aug 11 16:24:05 EST 2009 The closest I have found so far is dateutil.parser, which fails for the above example. thanks ...