parsing

Rules based parsing for configuration files

What is the best method to implement a system for parsing a configuration file based on a set of rules? I would appreciate any pointers in the direction of best practices or existing implementations. Edit: I have not decided not choice of any specific language yet but I am comfortable with both Perl and Python. The files are something a...

XML to C struct and C struct to XML

I like to do my server side programming in C, but need to inter-operate with some XML. What I need to write is some function that, given a C structure, or nested structure, and another structure (or nested structures) that describes the elements in the C structure, spits it out as XML. And another function that reads the XML, verifies ...

How to read text written on a image?

I need to parse some scanned documents to textual data. Is it possible to parse text written on a image using some software. If yes , please recommend any such online utility or software. ...

DOM Manipulation with PHP

I would like to make a simple but non trivial manipulation of DOM Elements with PHP but I am lost. Assume a page like Wikipedia where you have paragraphs and titles (<p>, <h2>). They are siblings. I would like to take both elements, in sequential order. I have tried GetElementbyName but then you have no possibility to organize informa...

Parse Javascript to instrument code

Hi, i need to split a javascript file into single instructions. For example: a = 2; foo() function bar() { b = 5; print("spam"); } has to be separated into three instructions. (assignment, function call and function definition). Basically i need to instrument the code, injecting code between these instructions to perform che...

Python regex question: stripping multi-line comments but maintaining a line break

Hello, I'm parsing a source code file, and I want to remove all line comments (i.e. starting with "//") and multi-line comments (i.e. /..../). However, if the multi-line comment has at least one line-break in it (\n), I want the output to have exactly one line break instead. For example, the code: qwe /* 123 456 789 */ asd should ...

PHP: Can't find syntax error

Parse error: syntax error, unexpected $end in blah/blah/blah.php line 1 This is the error that I receive, with this code <?php include("db.php"); if (isset($_POST['username']) && isset($_POST['password']) && isset($_POST['email'])) { //Prevent SQL injections $username = mysql_real_escape_string($_POST['username']); ...

Java negative int to hex and back fails

Hello, public class Main3 { public static void main(String[] args) { Integer min = Integer.MIN_VALUE; String minHex = Integer.toHexString(Integer.MIN_VALUE); System.out.println(min + " " + minHex); System.out.println(Integer.parseInt(minHex, 16)); } } Gives -2147483648 80000000 Exception in thread "main"...

Tools to build a DSL in .NET

I'm getting teased more and more into developing DSLs. I've developed a tiny one with F# using fslex and fsyacc but the error messages are inaccurate (I also can't find a way to generate better ones, there seems to be little documentation on how to handle error cases) and the fact that they won't parse UNICODE strings adequately is not a...

Using Compile Assembly From Source to evaluate math equations in C#

I am considering parsing simple math equations by compiling from source at runtime. I have heard that there are security considerations that I should be aware of before using this approach, but I can’t find any info on this. Thanks C# .net 2.0, winforms ...

Parsing in Python: what's the most efficient way to supress/normalize strings?

Hello, I'm parsing a source file, and I want to "suppress" strings. What I mean by this is transform every string like "bla bla bla +/*" to something like "string" that is deterministic and does not contain any characters that may confuse my parser, because I don't care about the value of the strings. One of the issues here is string fo...

Why can't a recursive-descent parser handle left recursion

Could someone please explain to me why recursive-descent parsers can't work with a grammar containing left recursion? ...

Writing/parsing a fixed width file using Python

I'm a newbie to Python and I'm looking at using it to write some hairy EDI stuff that our supplier requires. Basically they need an 80-character fixed width text file, with certain "chunks" of the field with data and others left blank. I have the documentation so I know what the length of each "chunk" is. The response that I get back ...

C# Stripping / converting one or more characters

Hi all, Is there a fast way (without having to explicitly looping through each character in a string) and either stripping or keeping it. In Visual FoxPro, there is a function CHRTRAN() that does it great. Its on a 1:1 character replacement, but if no character in the alternate position, its stripped from the final string. Ex CHRTRA...

Java BBCode library

Has anybody used a good Java implementation of BBCode? I am looking at javabbcode : nothing to see kefir-bb : Listed as alpha BBcode parser in JBoss source code. Are there any better options? ...

Nested prohibit/require operators in Lucene search queries

I am using Lucene for Java, and need to figure out what the engine does when I execute some obscure queries. Take the following query: +(foo -bar) If I use QueryParser to parse the input, I get a BooleanQuery object that looks like this: org.apache.lucene.search.BooleanQuery: org.apache.lucene.search.BooleanClause(required=true,...

RegEx - Looking for emails inside of a log file

Hey everyone, I am looking for a regular expression that will test for matches against a string such as: mxtreme1.log:May 12 07:00:00 10.1.1.175 postfix/cleanup[48145]: C2C9FFA730: fullname=, [email protected], [email protected], [email protected], prior=, as_score=0, as_strategy=M, code=W, actions=FFFFFFF...

Parse JSON array

Hi, I fetch a JSON array from a web service with touchJSON. Which looks like this: [{"icecream": {"title": "Banana"}}, {"icecream": {"title": "Strawberry"}}] I'm not able to parse this into a NSDictionary, because touchJSON doesn't support JSON arrays. How do I get my JSON array into a NSDicitionary? Regards ...

Parsing RTF Documents with Java/JavaCC

Is anybody familiar with the the RTF document format and parsing using any Java libaries. The standard way people have done this is by using the RTFEditorKit in the JDK Swing API: Swing RTFEditorKit API but it isn't that accurate when it comes to parsing RTF documents. In fact there's a comment in the API: The RTF support was not...

Process many incoming emails in Rails: MySQL vs. Imap / Pop3 vs. other solution

Hi, at an application I'm working on users can forward their email-accounts to an address from our system (something like [email protected] ). It doesn't matter here why they should do this, but I need some professional advice on the best way to approach this. The basic idea is that our mailserver receives the incoming (for...