parsing

Best way to compare 2 XML documents in Java

I'm trying to write an automated test of an application that basically translates a custom message format into an XML message and sends it out the other end. I've got a good set of input/output message pairs so all I need to do is send the input messages in and listen for the XML message to come out the other end. When it comes time to...

What would the best tool to create a natural DSL in Java?

A couple of days ago, I read a blog entry (http://ayende.com/Blog/archive/2008/09/08/Implementing-generic-natural-language-DSL.aspx) where the author discuss the idea of a generic natural language DSL parser using .NET. The brilliant part of his idea, in my opinion, is that the text is parsed and matched against classes using the same n...

How to read config file entries from an INI file

I can't use the Get*Profile functions because I'm using an older version of the windows CE platform SDK which doesn't have those. It doesn't have to be too general. [section] name = some string I just need to open the file, check for the existence of "section", and the the value associated with "name". Standard C++ preferred. ...

Best way to parse float?

What is the best way to parse a float in CSharp? I know about TryParse, but what I'm particularly wondering about is dots, commas etc. I'm having problems with my website. On my dev server, the ',' is for decimals, the '.' for separator. On the prod server though, it is the other way round. How can I best capture this? ...

Standard algorithm to tokenize a string, keep delimiters (in PHP)

I want to split an arithmetic expression into tokens, to convert it into RPN. Java has the StringTokenizer, which can optionally keep the delimiters. That way, I could use the operators as delimiters. Unfortunately, I need to do this in PHP, which has strtok, but that throws away the delimiters, so I need to brew something myself. This...

What is the opposite of 'parse'?

I have a function, parseQuery, that parses a SQL query into an abstract representation of that query. I'm about to write a function that takes an abstract representation of a query and returns a SQL query string. What should I call the second function? ...

I have a 100+MB XML file (sans-DTD/Schema). XSLT won't have it. Strategies for transforming/parsing?

This XML file contained archived news stories for all of last year. I was asked to sort these stories by story categor[y|ies] into new XML files. big_story_export.xml turns into lifestyles.xml food.xml nascar.xml ...and so on. I got the job done using a one-off python script, however, I originally attempted this using XSLT. This r...

Parsing Performance (If, TryParse, Try-Catch)

I know plenty about the different ways of handling parsing text for information. For parsing integers for example, what kind of performance can be expected. I am wondering if anyone knows of any good stats on this. I am looking for some real numbers from someone who has tested this. Which of these offers the best performance in which si...

Computer Science text book way to do text/xml/whatever parsing.

It's been ratling in my brain for a while. I've had some investigation on Compilers/Flex/Byson and stuff but I never found a good reference that talked in detail about the "parsing stack", or how to go about implementing one. Does anyone know of good references where I could catch up? Edit: I do appreciate all the compiler references,...

Parser-generator that outputs C# given a BNF grammar?

I'm looking for a tool that will be able to build a parser (in C#) if I give it a BNF grammar (eg. http://savage.net.au/SQL/sql-2003-2.bnf) Does such a generator exist? ...

Customized command line parsing in Python

I'm writing a shell for a project of mine, which by design parses commands that looks like this: COMMAND_NAME ARG1="Long Value" ARG2=123 [email protected] My problem is that Python's command line parsing libraries (getopt and optparse) forces me to use '-' or '--' in front of the arguments. This behavior doesn't match my requirements. An...

Parsing Text in MS Access

I have column that contains strings. The strings in that column look like this: FirstString/SecondString/ThirdString I need to parse this so I have two values: Value 1: FirstString/SecondString Value 2: ThirdString I could have actually longer strings but I always nee it seperated like [string1/string2/string3/...][stringN] What I n...

Why I can't parse a SimpleDateFormat with pattern "MMMMM dd" in Java?

I need to parse a string like "February 12, 1981" as a Date. I use SimpleDateFormat. But if I do: new SimpleDateFormat("MMMMM dd, yyyy").parse("February 12, 1981") I get java.text.ParseException. I tried to reduce it to see where the problem is. First: new SimpleDateFormat("MMMMM").parse("February") works. Then: new SimpleDateFor...

How can I parse the first, middle and last name from a full name field in SQL?

I'm having to do some data conversion, and I need to try to match up on names that are not a direct match on full name. I'd like to be able to take the full name field and break it up into first, middle and last name. The data does not include any prefixes or suffixes. The middle name is optional. The data is formatted 'First Middle La...

Emulation of lex like functionality in Perl or Python.

Here's the deal. Is there a way to have strings tokenized in a line based on multiple regexes? One example: I have to get all href tags, their corresponding text and some other text based on a different regex. So I have 3 expressions and would like to tokenize the line and extract tokens of text matching every expression. I have actua...

What would be the best way to parse this file?

Hi all, I was just wondering if anyone knew of a good way that I could parse the file at the bottom of the post. I have a database setup with the correct tables for each section eg Refferal Table,Caller Table,Location Table. Each table has the same columns that are show in the file below I would really like something that is fairly ge...

How to parse formatted email address into display name and email address?

Given the email address: "Jim" <[email protected]> If I try to pass this to MailAddress I get the exception: The specified string is not in the form required for an e-mail address. How do I parse this address into a display name (Jim) and email address ([email protected]) in C#? EDIT: I'm looking for C# code to parse it. EDIT2: I ...

Read fixed width record from text file

I've got a text file full of records where each field in each record is a fixed width. My first approach would be to parse each record simply using string.Substring(). Is there a better way? For example, the format could be described as: <Field1(8)><Field2(16)><Field3(12)> And an example file with two records could look like: Som...

How can I split a pipe-separated string in a list?

Here at work we are working on a newsletter system that our clients can use. As an intern one of my jobs is to help with the smaller pieces of the puzzle. In this case what I need to do is scan the logs of the email server for bounced messages and add the emails and the reason the email bounced to a "bad email database". The bad emails ...

Regex for parsing directory and filename

I'm trying to write a regex that will parse out the directory and filename of a fully qualified path using matching groups. so... /var/log/xyz/10032008.log would recognize group 1 to be "/var/log/xyz" and group 2 to be "10032008.log" Seems simple but I can't get the matching groups to work for the life of me. NOTE: As pointed out b...