parsing

Looking for C# HTML parser

I would like to extract the structure of the HTML document - so the tags are more important than the content. Ideally, it would be able to cope reasonably with badly-formed HTML to some extent also. Anyone know of a reliable and efficient parser? ...

C#: Test if string is a guid without throwing exceptions?

i want to try to convert a string to a Guid, but i don't want to rely on catching exceptions ( for performance reasons - exceptions are expensive for usability reasons - the debugger pops up for design reasons - the expected is not exceptional In other words the code: public static Boolean TryStrToGuid(String s, out Guid value) { ...

How do I implement a two-pass scanner using GNU Flex?

As a pet-project, I'd like to attempt to implement a basic language of my own design that can be used as a web-scripting language. It's trivial to run a C++ program as an Apache CGI, so the real work lies in how to parse an input file containing non-code (HTML/CSS markup) and server-side code. In my undergrad compiler course, we used Fl...

What makes Ometa special?

Ometa is "a new object-oriented language for pattern matching." I've encountered pattern matching in languages like Oz tools to parse grammars like Lexx/Yacc or Pyparsing before. Despite looking at example code, reading discussions, and talking to a friend, I still am not able to get a real understanding of what makes Ometa special (or ...

PLY: Token shifting problem in C parser

I'm writing a C parser using PLY, and recently ran into a problem. This code: typedef int my_type; my_type x; Is correct C code, because my_type is defined as a type previously to being used as such. I handle it by filling a type symbol table in the parser that gets used by the lexer to differentiate between types and simple identifie...

Smart design of a math parser?

What is the smartest way to design a math parser? what i mean is a function that takes a math string (like: "2 + 3 / 2 + (2 * 5)") and returns the calculated value? I did write one in VB6 ages ago but it ended up being way to bloated and not very portable (or smart for that matter...). General ideas, psuedo code or real code is appreciat...

python regular expression to split paragraphs.

How would one write a regular expression to use in python to split paragraphs? A paragraph is defined by 2 linebreaks (\n). But one can have any ammount of spaces/tabs together with the line breaks, and it still should be considered as a paragraph. I am using python so the solution can use python's regular expression syntax which is ex...

Best StAX Implementation

My quick search reveals the reference implementation (http://stax.codehaus.org), the Woodstox implementation (http://woodstox.codehaus.org), and Sun's SJSXP implementation (https://sjsxp.dev.java.net/). Please comment on the relative merits of these, and fill me in on any other implementations I should consider. ...

A Delphi/FreePascal lib or function that emulates the PHP's function parse_url

I'm doing a sitemap producer in Object Pascal and need a good function or lib to emulate the parse_url function on PHP. Does anyone know of any good ones? ...

What is the intended use of the DEFAULT section in config files used by ConfigParser?

I've used ConfigParser for quite a while for simple configs. One thing that's bugged me for a long time is the DEFAULT section. I'm not really sure what's an appropriate use. I've read the documentation, but I would really like to see some clever examples of its use and how it affects other sections in the file (something that really ill...

How to parse ISO formatted date in python?

I need to parse strings like that "2008-09-03T20:56:35.450686Z" into the python's datetime? I have found only strptime in the python 2.5 std lib, but it not so convinient. Which is the best way to do that? Update: It seems, that python-dateutil works very well. I have found that solution: d1 = '2008-09-03T20:56:35.450686Z' d2 = date...

Recursive descent parsing - from LL(1) up

Hello, The following simple "calculator expression" grammar (BNF) can be easily parsed with the a trivial recursive-descent parser, which is predictive LL(1): <expr> := <term> + <term> | <term> - <term> | <term> <term> := <factor> * <factor> <factor> / <factor> <fa...

How can I identify references to Java classes using Perl?

I'm writing a Perl script and I've come to a point where I need to parse a Java source file line by line checking for references to a fully qualified Java class name. I know the class I'm looking for up front; also the fully qualified name of the source file that is being searched (based on its path). For example find all valid referen...

Parsing XML With Single Quotes?

I am currently running into a problem where an element is coming back from my xml file with a single quote in it. This is causing xml_parse to break it up into multiple chunks, example: Get Wired, You're Hired! Is then enterpreted as 'Get Wired, You' being one object, the single quote being a second, and 're Hired!' as a third. What I w...

A StringToken Parser which gives Google Search style "Did you mean:" Suggestions

Seeking a method to: Take whitespace separated tokens in a String; return a suggested Word ie: Google Search can take "fonetic wrd nterpreterr", and atop of the result page it shows "Did you mean: phonetic word interpreter" A solution in any of the C* languages or Java would be preferred. Are there any existing Open Libraries which...

Regular expression to test whether a string consists of a valid real number in base 10.

Examples: "1" yes "-1" yes "- 3" no "1.2" yes "1.2.3" no "7e4" no (though in some cases you may want to allow scientific notation) ".123" yes "123." yes "." no "-.5" yes "007" yes "00" yes ...

Parsing a raw email message that may be in html or various strange encodings and converting it to plain text, the way, say, pine might display it.

The reason I want to do this is to make it easy to parse out instructions that are emailed to a bot, the kind of thing majordomo might do to parse commands like subscribing and unsubscribing. It turns out there are a lot of crazy formats and things to deal with, like quoted text, distinguishing between header and body, etc. A perl modu...

What's the best way to read and parse a large text file over the network?

I have a problem which requires me to parse several log files from a remote machine. There are a few complications: 1) The file may be in use 2) The files can be quite large (100mb+) 3) Each entry may be multi-line To solve the in-use issue, I need to copy it first. I'm currently copying it directly from the remote machine to the local ...

PHP parse_ini_file() - where does it look?

if I call php's parse_ini_file("foo.ini"), in what paths does it look for foo.ini ? the include path? the function's documentation doesn't mention it. ...

What is the best way to parse a time into a Date object from user input in Javascript?

I am working on a form widget for users to enter a time of day into a text input (for a calendar application). Using JavaScript (we are using jQuery FWIW), I want to find the best way to parse the text that the user enters into a JavaScript Date() object so I can easily perform comparisons and other things on it. I tried the parse() met...