parsing

Idea on parsing character syntax diagram.

Folks I'm implementing a weird thing, I have to write a utility to parse a syntax diagram in plain text format and convert it to xml format, the thing basically is identical as this from IBM(like in the "Creating a No-Conversion Job" part): http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.sqls.doc/sqls17.htm ...

Parse lines of integers in C

This is a classical problem, but I can not find a simple solution. I have an input file like: 1 3 9 13 23 25 34 36 38 40 52 54 59 2 3 9 14 23 26 34 36 39 40 52 55 59 63 67 76 85 86 90 93 99 108 114 2 4 9 15 23 27 34 36 63 67 76 85 86 90 93 99 108 115 1 25 34 36 38 41 52 54 59 63 67 76 85 86 90 93 98 107 113 2 3 9 16 24 28 2 3 10 1...

C# - Parse HTML source as XML

I would like to read in a dynamic URL what contains a HTML file, and read it like an XML file, based on nodes (HTML tags). Is this somehow possible? I mean, there is this HTML code: <table class="bidders" cellpadding="0" cellspacing="0"> <tr class="bidRow4"> <td>kucik (automata)</td> ...

Objective C - RegexKitLite - Parsing inner contents of a string, ie: start(.*?)end

Please consider the following: NSString *myText = @"mary had a little lamb"; NSString *regexString = @"mary(.*?)little"; for(NSString *match in [myText captureComponentsMatchedByRegex:regexString]){ NSLog(@"%@",match); } This will output to the console two things: 1) "mary had a little" 2) "had a" What I want is just the 2nd bit ...

How to color HTML elements based on parsing a user command string

I'm working on a little parsing thing to color objects. For an example, you could type red:Hi!: and "Hi!" would be red. This is my not working code: <script type="text/javascript"> function post() { var preview = document.getElementById("preview"); var submit = document.getElementById("post"); var text = submit.value; <...

IE JavaScript date parsing error

Hi, Why cannot IE parse this string as a Date object. var d = Date.parse("Fri Jun 11 04:55:12 +0000 2010"); // returns NaN However, it works well in FireFox. I am running IE 8. Thanks. ...

RSS Feed parser library in Java

Is there any Java helper utils/libraries available for parsing Rss/Atom feed? I checked RSSUtils but it looks like outdated. ...

Comments in XML at beginning of document

my PYTHON xml parser fails if there´s a comment at the beginnging of an xml file like:: <?xml version="1.0" encoding="utf-8"?> <!-- Script version: "1"--> <!-- Date: "07052010"--> <component name="abc"> <pp> .... </pp> </component> is it illegal to place a comment like this? EDIT: well it´s not throwing an error but the DOM modu...

Tokenizing numbers for a parser

I am writing my first parser and have a few questions conerning the tokenizer. Basically, my tokenizer exposes a nextToken() function that is supposed to return the next token. These tokens are distinguished by a token-type. I think it would make sense to have the following token-types: SYMBOL (such as <, :=, ( and the like WHITESPAC...

Parsing Indentation-based syntaxes in Haskell's Parsec

I'm trying to parse an indentation-based language (think Python, Haskell itself, Boo, YAML) in Haskell using Parsec. I've seen the IndentParser library, and it looks like it's the perfect match, but what I can't figure out is how to make my TokenParser into an indentation parser. Here's the code I have so far: import qualified Text.Pars...

[C++/general] parser with scopes and conditionals

I'm writing a C/C++/... build system (I understand this is madness ;)), and I'm having trouble designing my parser. My "recipes" look like this: global { SOURCE_DIRS src HEADER_DIRS include SOURCES bitwise.c \ framing.c HEADERS \ ogg/os_types.h \ ogg/ogg.h } lib static ogg_static { ...

how to detect an escape sequence in a string

Given a string named line whose raw version has this value: \rRAWSTRING how can I detect if it has the escape character \r? What I've tried is: if repr(line).startswith('\r'): blah... but it doesn't catch it. I also tried find, such as: if repr(line).find('\r') != -1: blah doesn't work either. What am I missing? thx!...

"Content is not allowed in prolog" when parsing perfectly valid XML on GAE

Hey guys, I've been beating my head against this absolutely infuriating bug for the last 48 hours, so I thought I'd finally throw in the towel and try asking here before I throw my laptop out the window. I'm trying to parse the response XML from a call I made to AWS SimpleDB. The response is coming back on the wire just fine; for examp...

Is it possible to parse and apply patch files in PHP?

The idea is to have a PHP script parse a given .patch file and will apply the given patch accordingly. Assume that the script has no access to command line so the script will have to do the parsing itself. Is there a library somewhere? ...

Parsing every part of an HTTP header field-value

Hi all. I'm parsing HTTP data directly from packets (either TCP reconstructed or not, you can assume it is). I'm looking for the best way to parse HTTP as accurately as possible. The main issue here is the HTTP header. Looking at the basic RFC of HTTP/1.1, it seems that HTTP header parsing would be complex. The RFC describes very com...

What sort of object is this and how to use it?

What would be the correct name for this type of array? There are 3 main sections and 4 sub-parts consisting of "issuedTime" "text" "url" and "validToTime", how do you start to convert this to an object? If there was only 1 main section, it would be fairly simple to do however with 3 main parts and no identification for each main sectio...

Utility to format a Bison/Yacc grammar file nicely?

Hello there: Do you folks know of any GNU/Linux utility to format a Bison grammar file, containing C code, nicely? I'm thinking of something along the lines of GNU Indent, but designed to beautify grammar files rather than C code. ...

Strip text except from the contents of a tag

The opposite may be achieved using pyparsing as follows: from pyparsing import Suppress, replaceWith, makeHTMLTags, SkipTo #... removeText = replaceWith("") scriptOpen, scriptClose = makeHTMLTags("script") scriptBody = scriptOpen + SkipTo(scriptClose) + scriptClose scriptBody.setParseAction(removeText) data = (scriptBody).transformStrin...

build error with boost spirit grammar (boost 1.43 and g++ 4.4.1)

I'm having issues getting a small spirit/qi grammar to compile. The build stack trace is fugly enought to not make any sense to me (despite some assertion_failed i could notice in there but that didn't brought much information) the input grammar header: inputGrammar.h #include <boost/config/warning_disable.hpp> #include <boost/spirit/...

Parser that accepts Scala Identifiers?

I was wondering whether the standard Scala parser combinators contain a parser that accepts the same identifiers that the Scala language itself also accepts (as specified in the Scala Language Specification, Section 1.1). The StdTokenParsers trait has an ident parser, but it rejects identifiers like empty_?. (If there is indeed no such...