questions about parsing | ansaurus

parsing

Using Lucene Highlighter along with MultiFieldQueryParser

Im using Lucene Highlighter to highlight the matches that I have found in a Lucene Index. Now, my problem is that If I have to search multiple fields of a document, and I need to display the matching text, then how can I get in which field the hit has occurred? The code which I am using for the highlighter is basically the second functi...

How to implement a left recursion eliminator?

How can i implement an eliminator for this? A := AB | AC | D | E ; ...

Parsing simple MIME files from C/C++?

Hello everyone, I have searched the web for days now but I can't seem to find a good solution to my problem: For one of my projects I'm looking for a good (lightweight) MIME parser. My customer provides MIME formatted files (linear, no hierarchy) which contain 3-4 "parts". The application must be able to split those parts and process t...

Remove parent xml tag based on child value

For example, we have xml file with this format: <A> <B> <C></C> <D></D> <D></D> </B> </A> i need that: if all "D"-tags elements are empty, then we need to delete whole "A"-tag element and, of course, we need to do this with all "A"-tags in xml. ...

Defining tokens at runtime

I want to write a parser for EDIFACT messages with JavaCC. My problem is that I cannot define all terminal symbols before parsing a message because at the begining of each message there is a so called "Advice Segment" ("UNA" Segment) which defines things like element seperator symbol, escape symbol, segment terminator symbol and decimal ...

Python + Expat: Error on  entities

I have written a small function, which uses ElementTree and xpath to extract the text contents of certain elements in an xml file: #!/usr/bin/env python2.5 import doctest from xml.etree import ElementTree from StringIO import StringIO def parse_xml_etree(sin, xpath): """ Takes as input a stream containing XML and an XPath expression...

Parse XML document

I am trying to parse a remote XML document (from Amazon AWS): <ItemLookupResponse xmlns="http://webservices.amazon.com/AWSECommerceService/2009-03-31"> <OperationRequest> <RequestId>011d32c5-4fab-4c7d-8785-ac48b9bda6da</RequestId> <Arguments> <Argument Name="Condition" Value="New"></Argument> ...

IPhone- Rss sample code, modify to display images... PLease help!!!

Hi! I am trying to make app that displays an RSS feed, with text and images into a table, but I am really struggeling with it! I found a really good [sample code-project][1] that i can really recommend-- but im struggeling getting it to display images in the tablecells instead of only text I would be reeeeally happy with any help!! ...

RegEx - Indexed/Arrayed Named Capture Groups?

I have a situation where something can appear in a format as follows: ---id-H-- Header: data Another Header: more data Message: sdasdasdasd Message: asdasdasdasd Message: asdasdasd There may be many messages, or just a couple. I'd prefer not having to step outside of RegEx, because I am using the RegEx to parse some header information...

build error with boost spirit grammar (boost 1.43 and g++ 4.4.1) part II

I'm having issues getting a small spirit/qi grammar to compile. i am using boost 1.43 and g++ 4.4.1. the input grammar header: the build error seems to be pointing to the definition of the 'instruction' rule, maybe it is the '[sp::_val = sp::_1]' that somehow brokes it but this is more or less based on what the spirit documentation tuto...

Take data from an XML file and put it into a MySQL database

Hi Guys, I'm looking to construct a script that would go through an XML file. Would find specific tags in it, put them in a table and fill the table with specific tags within them. I'm using MySQL 5.1 so loadXML isn't an option and I think that ExtractData() method wont be much use either.. but I don't really know. What would be the bes...

how to check an ANTLR token is only used once or less in the parser

In Antlr, if I have a rule for example: someRule : TOKENA TOKENB; it would accept : "tokena tokenb" if I would like TOKENA to be optional, I can say, someRule : TOKENA* TOKENB; then I can have : "tokena tokenb" or "tokenb" or "tokena tokena tokenb" but this also means it can be repeated more that once. Is there anyway I can say t...

Parse string with bash and extract number

Hello I've got supervisor's status output, looking like this. frontend RUNNING pid 16652, uptime 2:11:17 nginx RUNNING pid 16651, uptime 2:11:17 redis RUNNING pid 16607, uptime 2:11:32 I need to extract nginx's PID. I've done it via grep -P comman...

How to deal with overlapping character groups in different tokens in an EBNF grammar?

I'm using an LL(k) EBNF grammar to parse a character stream. I need three different types of tokens: CHARACTERS letter = 'A'..'Z' + 'a'..'z' . digit = "0123456789" . messageChar = '\u0020'..'\u007e' - ' ' - '(' - ')' . TOKENS num = ['-'] digit { digit } [ '.' digit { digit } ] . ident = letter { letter | digit | '_' } . ...

language-agnostic

Parsing complex string using regex

My regex skills are not very good and recently a new data element has thrown my parser into a loop Take the following string "+USER=Bob Smith-GROUP=Admin+FUNCTION=Read/FUNCTION=Write" Previously I had the following for my regex : [+\\-/] Which would turn the result into USER=Bob Smith GROUP=Admin FUNCTION=Read FUNCTION=Write FUNCT...

Is XMLReader a SAX parser, a DOM parser, or neither?

I am testing various methods to read (possibly large, and very often) XML configuration files in PHP. No writing is ever needed. I have two successful implementations, one using SimpleXML (which I know is a DOM parser) and one using XMLReader. I know that a DOM reader must read the whole tree and therefore uses more memory. My tests ...

Memory Issues When DOM Parsing A Large XML File on Android Devices

Hey awesome SO users, I have an Android application that parses an XML file for users and displays results in a much more mobile friendly format. The app works great for most users, but some users have lots and lots of data and the app crashes on them because it runs out of memory. Is there any way I have a DOM style XML parser quit pa...

Parsing email with Python

I'm writing a Python script to process emails returned from Procmail. As suggested in this question, I'm using the following Procmail config: :0: |$HOME/process_mail.py My process_mail.py script is receiving an email via stdin like this: From hostname Tue Jun 15 21:43:30 2010 Received: (qmail 8580 invoked from network); 15 Jun 2010 2...

Parsing Lisp S-Expressions with known schema in C#

I'm working with a service that provides data as a Lisp-like S-Expression string. This data is arriving thick and fast, and I want to churn through it as quickly as possible, ideally directly on the byte stream (it's only single-byte characters) without any backtracking. These strings can be quite lengthy and I don't want the GC churn ...

Parsing a string for dates in PHP

Given an arbitrary string, for example ("I'm going to play croquet next Friday" or "Gadzooks, is it 17th June already?"), how would you go about extracting the dates from there? If this is looking like a good candidate for the too-hard basket, perhaps you could suggest an alternative. I want to be able to parse Twitter messages for date...

1
...
150
151
152
153
154
...
207