parsing

Alternatives to Regular Expressions

I have a set of strings with numbers embedded in them. They look something like /cal/long/3/4/145:999 or /pa/metrics/CosmicRay/24:4:bgp:EnergyKurtosis. I'd like to have an expression parser that is Easy to use. Given a few examples someone should be able to form a new expression. I want end users to be able to form new expressions ...

Parse filter expression using RegEx

I have a query filter written in human readable language. I need to parse it and convert to the SQL where clause. Examples are: CustomerName Starts With 'J' becomes CustomerName LIKE 'J%' and CustomerName Includes 'Smi' becomes CustomerName LIKE '%Smi%' The full expression to be parsed may be much more complicated such as C...

Help building a regular expression in python using the re module

Hi guys, im writing a simple propositional logic formula parser in python which uses regular expressions re module and the lex/yacc module for lexing/parsing. Originally my code could pick out implication as ->, but adding logical equivalence (<->) caused issues with the compiled expressions IMPLICATION = re.compile('[\s]*\-\>[\s]*') EQ...

Parse Pdf File and write content in word file using java

how to Parse a PDF file and write the content in word file using Java? ...

how to iterate through each line from a ascii file

In shell script, how do I iterate through each line in an ASCII file and perform an operation on its value This is the example for an ASCII file which I have 23 3.4e-09 55.90 5.7e-07 24 12.5 79.90 7.9e-09 25 67.9 78.9 3.4e-09 26 98.8 89.67 9.7e-09 how cum it will t...

Reporting against a CSV field in a SQL server 2005 DB

Ok so I am writing a report against a third party database which is in sql server 2005. For the most part its normalized except for one field in one table. They have a table of users (which includes groups.) This table has a UserID field (PK), a IsGroup field (bit) , a members field (text) this members field has a comma separated list...

How do you parse an HTML in vb.net

I would like to know if there is a simple way to parse HTML in vb.net. I know that HTML is not sctrict subset of XML, but it would be nice if it could be treated that way. Is there anything out there that would let me parse HTML in an XML-like way in VB.net? ...

Is ANTLR an appropriate tool to serialize/deserialize a binary data format?

I need to read and write octet streams to send over various networks to communicate with smart electric meters. There is an ANSI standard, ANSI C12.19, that describes the binary data format. While the data format is not overly complex the standard is very large (500+ pages) in that it describes many distinct types. The standard is ful...

Extracting nested function names from a JavaScript function

Given a function, I'm trying to find out the names of the nested functions in it (only one level deep). A simple regex against toString() worked until I started using functions with comments in them. It turns out that some browsers store parts of the raw source while others reconstruct the source from what's compiled; The output of toSt...

Time/Date range grammars

I need to parse strings containing time spans such as: Thursday 6:30-7:30 AM December 30, 2009 - January 1, 2010 1/15/09, 7:30 to 8:30 PM Thursday, from 6:30 to 7:30 AM and others... added 6:30 to 7:30 and date/times such as most any cases that Word's insert->date can generate As I'd be extremely surprised if anything out there ...

Where is a good Address Parser

I'm looking for a good tool that can take a full mailing address, formatted for display or use with a mailing label, and convert it into a structured object. So for instance: // Start with a formatted address in a single string string f = "18698 E. Main Street\r\nBig Town, AZ, 86011"; // Parse into address Address addr = new Address(f...

Learning Treetop

I'm trying to teach myself Ruby's Treetop grammar generator. I am finding that not only is the documentation woefully sparse for the "best" one out there, but that it doesn't seem to work as intuitively as I'd hoped. On a high level, I'd really love a better tutorial than the on-site docs or the video, if there is one. On a lower leve...

How does one parse simple inline markup (i.e. *bold*), in Python?

How does one implement a parser (in Python) for a subset of wikitext that modifies text, namely: *bold*, /italics/, _underline_ I'm converting it to LaTeX, so the conversion is from: Hello, *world*! Let's /go/. to: Hello \textbf{world}! Let's \textit{go}. Though there's nothing specific about it being a conversion to LaTeX (nota...

Splitting arguments -- preserving quoted substrings -- in python

Exact duplicate: http://stackoverflow.com/questions/79968/split-a-string-by-spaces-in-python I want to take in a string and return a list, dictionary or tuple of values as separated by spaces. However, I want to not match spaces that are somehow between quote marks, i.e. apple orange "banana tree" green Should come back as three...

Regular expression to detect semi-colon terminated C++ for & while loops

In my Python application, I need to write a regular expression that matches a C++ for or while loop that has been terminated with a semi-colon (;). For example, it should match this: for (int i = 0; i < 10; i++); ... but not this: for (int i = 0; i < 10; i++) This looks trivial at first glance, until you realise that the text betwe...

EBNF to Scala parser combinator

I have the following EBNF that I want to parse: PostfixExp -> PrimaryExp ( "[" Exp "]" | . id "(" ExpList ")" | . length )* And this is what I got: def postfixExp: Parser[Expression] = ( primaryExp ~ rep( "[" ~ expression ~ "]" | "." ~ ident ~"...

Good tools for creating a C/C++ parser/analyzer

What are some good tools for getting a quick start for parsing and analyzing C/C++ code? In particular, I'm looking for open source tools that handle the C/C++ preprocessor and language. Preferably, these tools would use lex/yacc (or flex/bison) for the grammar, and not be too complicated. They should handle the latest ANSI C/C++ defi...

Fastest XML parser for small, simple documents in Java

I have to objectify very simple and small XML documents (less than 1k, and it's almost SGML: no namespaces, plain UTF-8, you name it...), read from a stream, in Java. I am using JAXP to process the data from my stream into a Document object. I have tried Xerces, it's way too big and slow... I am using Dom4j, but I am still spending way ...

Unexpected result when using Enum.Parse()

class Program { static void Main(string[] args) { String value = "Two"; Type enumType = typeof(Numbers); Numbers number = (Numbers)Enum.Parse(enumType, value); Console.WriteLine(Enum.Parse(enumType, value)); } public enum Numbers : int { One, Two, Three, ...

Parsing datetime strings with microseconds

I have a text file with a lot of datetime strings in isoformat. The strings are similar to this: '2009-02-10 16:06:52.598800' These strings were generated using str(datetime_object). The problem is that, for some reason, str(datetime_object) generates a different format when the datetime object has microseconds set to zero and some str...