parsing

How would I dynamically add a new XML node based on the values of other nodes?

Background: I have an old web CMS that stored content in XML files, one XML file per page. I am in the process of importing content from that CMS into a new one, and I know I'm going to need to massage the existing XML in order for the import process to work properly. Existing XML: <page> <audience1>true</audience> <audience2>f...

Most elegant way to detect if a String is a number?

Is there a better, more elegant (and/or possibly faster) way than boolean isNumber = false; try{ Double.valueOf(myNumber); isNumber = true; } catch (NumberFormatException e) { } ...? Edit: Since I can't pick two answers I'm going with the regex one because a) it's elegant and b) saying "Jon Skeet solved the problem" is a tau...

Nesting SAX ContentHandlers

I would like to parse a document using SAX, and create a subdocument from some of the elements, while processing others purely with SAX. So, given this document: <DOC> <small> <element /> </small> <entries> <!-- thousands here --> </entries> </DOC> I would like to parse the DOC and DOC/entries elements...

Where can I get material for learning EBNF?

Extended Backus–Naur Form: EBNF I'm very new to parsing concepts. Where can I get sufficiently easy to read and follow material for writing a grammar for the boost::spirit library, which uses a grammar similar to EBNF? Currently I am looking into EBNF from Wikipedia. ...

What is a good C/C++ CSS parser?

What is a good C/C++ CSS parser? All that I can find is CSSTidy, and it seems to be more of an application than a parsing library. ...

Are there any good parsing libraries for .Net?

I'm looking for something simple to use where the grammar is easy to define. ...

C# Interpreted Language

I am looking to write an interpreted language in C#, where should I start? I know how I would do it using fun string parsing, but what is the correct way? ...

Parsing resources

Does anyone know any resources (books/websites/whatever) related to parsing. I'm not so much interested in specific technologies such as yacc, XML or regular expressions, but something more general about best practices, stream vs. pull, error reporting and recovery, gotchas to look out for etc. ...

Looking for XML parser

I have been tasked with finding an open source DOM XML parser. The parser must minimally support XPath 1.0. Schema support is desired, but not a deal breaker The files we are parsing will be small so speed and memory consumption are not a large concern. Any OO language (C++, C#, Java, etc.). To clarify, the plan is to integrate ...

I have to read invoice data from a convoluted ASCII file, how would you guard against future changes?

I have to read invoice ascii files that are structured in a really convoluted way, for example: 55651108 3090617.10.0806:46:32101639Example Company Construction Company Example Road. 9 9524 Example City There's actually additional stuff in there, but I don't want to confuse you any further. I know I'...

Oracle SQL - Parsing a name string and converting it to first initial & last name

Hey.. Does anyone know how to turn this string: "Smith, John R" Into this string: "jsmith" ? I need to lowercase everything with lower() Find where the comma is and track it's integer location value Get the first character after that comma and put it in front of the string Then get the entire last name and stick it after the first init...

Decoding byte stream

I have a series of messages that are defined by independant structs. These structs share a common header are sent between applications. I am creating a decoder that will take the raw data captures in the messages that were built using these structs and decode/parse them to some plain text. I have over 1000 different messages that need t...

Are there well known algorithms for deducing the "return types" of parser rules?

Given a grammar and the attached action code, are there any standard solution for deducing what type each production needs to result in (and consequently, what type the invoking production should expect to get from it)? I'm thinking of an OO program and action code that employs something like c#'s var syntax (but I'm not looking for som...

Best XML parser for Java

I need to read smallish (few MB at the most, UTF-8 encoded) XML files, rummage around looking at various elements and attributes, perhaps modify a few and write the XML back out again to disk (preferably with nice, indented formatting). What would be the best XML parser for my needs? There are lots to choose from. Some I'm aware of ar...

How to parse word documents with ruby?

Does anyone know of a library that I can use on OS X/Linux to parse Word files and output the content as HTML? I've had a look at win32ole but as far as I can see it's for Windows only, although I could be wrong. Any suggestions? ...

How can I parse people's full names into user names in Perl?

I need to convert a name in the format Parisi, Kenneth into the format kparisi. Does anyone know how to do this in Perl? Here is some sample data that is abnormal: Zelleb, Charles F.,,IV Eilt, John,, IV Wods, Charles R.,,III Welkt, Craig P.,,Jr. These specific names should end up as czelleb, jeilt, cwoods, cwelkt... etc ADDITION+++++ ...

IP address parsing in .NET

I'm using IPAddress.TryParse() to parse IP addresses. However, it's a little too permissive (parsing "1" returns 0.0.0.1). I'd like to limit the input to dotted octet notation. What's the best way to do this? (Note: I'm using .NET 2.0) Edit Let me clarify: I'm writing an app that will scan a range of IPs looking for certain devices...

Is Boost guilty of being un-Boost-like?

I was just reading the intro to the Boost::Spirit LL Parser framework. The preface suggests that the author and creator likes to use such parsing technology to read in program options. Doesn't Boost have its own library for program options? I am wondering, does the Boost committee review all the library notes for common themes and style...

Python - Parse String to Float or Int

This should be simple - In python, how can I parse a numeric string like "545.2222" to its corresponding float value, 542.2222 or "31" to an integer, 31? EDIT: I just wanted to know how to parse a float string to a float, and (separately) an int string to an int. Sorry for the confusing phrasing/original examples on my part. At any r...

Looking for a clear definition of what a "tokenizer", 'parser" and "lexers" are and how they are related to each other and used?

Hello, I am looking for a clear definition of what a "tokenizer", "parser" and "lexer" are and how they are related to each other (e.g., does a parser use a tokenizer or vice versa)? I need to create a program will go through c/h source files to extract data declaration and definitions. I have been looking for examples and can find so...