questions about parsing | ansaurus

parsing

C++ Read file from bottom to top

I have a very large file I need to parse, so reading it into memory all at once is non-ideal. The way the file is structured, it would be much, much easier if I could start at eof and go up to the beginning. Does anyone have a good trick for doing this? I'm using Visual Studio 2008 and C++. Thanks ...

How can I execute an ANTLR parser action for each item in a rule that can match more than one item?

I am trying to write an ANTLR parser rule that matches a list of things, and I want to write a parser action that can deal with each item in the list independently. Some example input for these rules is: $(A1 A2 A3) I'd like this to result in an evaluator that contains a list of three MyIdentEvaluator objects -- one for each of A1, A...

Matching SRC attribute of IMG tag using preg_match

I'm attempting to run preg_match to extract the SRC attribute from the first IMG tag in an article (in this case, stored in $row->introtext). preg_match('/\< *[img][^\>]*[src] *= *[\"\']{0,1}([^\"\']*)/i', $row->introtext, $matches); Instead of getting something like images/stories/otakuzoku1.jpg from <img src="images/stories/otak...

How can I parse space separated STDIN hex strings unpacked in Perl?

I have a Java program that spits out, in space-separated hexadecimal format, 16 bytes of raw packet received over the network. Since I dont want to change that code, I am piping the result to a Perl script that, theoretically, can simply unpack this from STDIN into recognizable variables. The following is a sample of the line input to my...

Parsing a multiline variable-length log file

I want to be able to utilize a 'grep' or 'pcregrep -M' like solution that parses a log file that fits the following parameters: Each log entry can be multiple lines in length First line of log entry has the key that I want to search for Each key appears on more then one line So in the example below I would want to return every line t...

parse lines using linq to txt

var t1 = from line in File.ReadAllLines(@"alkahf.txt") let item = line.Split(new string[] {". "}, StringSplitOptions.RemoveEmptyEntries) let verse = line.Split(new string[] { "\n. " }, StringSplitOptions.RemoveEmptyEntries) select new { ...

boost::Spirit Grammar for unsorted schema

I have a section of a schema for a model that I need to parse. Lets say it looks like the following. { type = "Standard"; hostname="x.y.z"; port="123"; } The properties are: The elements may appear unordered. All elements that are part of the schema must appear, and no other. All of the elements' synthesised attributes go into...

recursive-descent

Parse HTML in Android

Hi I am trying to parse HTML in android from a webpage, and since the webpage it not well formed, I get SAXException. Is there a way to parse HTML in android? (my guess is not, so the follow up question is: what is the best way to do this? ) Thanks ...

Parse whole html document and replace specific parts of it automatically PHP

Greetings, Iv'e made a rapid search in the previous questions but did not find an adequate answer for my question. I have create a function that finds words in an array library and replace these by links to the description of the word. Example : $words = array("ANTIM","APDIV","APVEG","ARCHE","ARFEU","ARMUR", "ARSUP","ARTHE","ARTIL","...

What do I do with a Concrete Syntax Tree?

I'm using pyPEG to create a parse tree for a simple grammar. The tree is represented using lists and tuples. Here's an example: [('command', [('directives', [('directive', [('name', 'retrieve')]), ('directive', [('name', 'commit')])]), ('filename', [('name', 'f30502')])])] My question is what do I do with...

concrete-syntax-tree

Parsing Documents with a DSL

I'm trying to come up with a way to go through about a million documents which are formal documents (for arguments sake, they are Thesis documents). They are not all standardized but close enough. They are Titles, sections, paragraphs etc. There are subtle differences that might crop up such as in english, we call a title "Title" but in ...

Hidden token into default channel - AntlrV3

Suppose I'm having white spaces (WS) in the hidden channel. And for a particular rule alone, I want white spaces also to be considered, is it possible to bring WS to the default channel for that particular rule alone in the parser? ...

Parse RSS pubDate to Date object in JavaScript

How would I do this? Tue, 2 Feb 2010 19:34:21 Etc/GMT ...

Why doesn't Perl's Spreadsheet::ParseExcel never return from $parser->parse('test.xls')?

The spreadsheet is Excel 97-2003 compatible and permissions 777 use strict; use Spreadsheet::ParseExcel; print "Content-type: text/html\n\n"; my $parser = Spreadsheet::ParseExcel->new(); print "<br>gets here:".__LINE__; my $workbook = $parser->parse('test.xls'); print "<br>never gets here:".__LINE__; ...

Antlr3 parser path command shell

I need to parse the command shell such as: cp /home/test /home/test2 My problem is in the correct path parsing. I defined a rule (I can not use a token as path but I need to define it in the parser): path : ('/' ID)+; with ID: (A.. Z | a.. z) +; WS: (' ') {$channel = HIDDEN;}; I need to keep the token WS hidden, but this gives ...

using boost::iostreams to read specifically crafted data, then based on that create object and append it to list

I have an interesting problem. Let's say that i have file with lines filled like this: name1[xp,y,z321](a,b,c){text};//comment #comment name2(aaaa); also I have (simplified) class: class something { public: something(const std::string& name); addOptionalParam(const std::string& value); addMandatoryParam(const std::string& value); ...

boost-iostreams

Will rewriting a multipurpose log file parser to use formal grammars improve maintainability?

TLDR: if I built a multipurpose parser by hand with different code for each format, will it work better in the long run using one chunk of parser code and an ANTLR, PyParsing or similar grammar to specify each format? Context: My job involves lots of benchmark log files from ~50 different benchmarks. There are a few in XML, a few HTML,...

How can I remove an element from a Perl array after I've processed it?

I am reading a postfix mail log file into an array and then looping through it to extract messages. On the first pass, I'm checking for a match on the "to=" line and grabbing the message ID. After building an array of MSGIDs, I'm looping back through the array to extract information on the to=, from=, and client= lines. What I'd like to...

How to filter mail from apple mail?

We have a program written in c# that goes through emails in outlook 2007 and parses out contact information that may be contained in the body of the email or any attachments. What we've found is that any email we get from apple mail, while having legitimate attachments, may also have other attachments that are not the types of files we ...

need help parsing an IIS log in c#

My IIS log has a query parameter(cs-uri-query) that looks like below "TraceId=8c0b8329-f125-4dec-90af-f508674284f5,PartnerId=Partner1\r\n,UserInput=Address1:+1234+block+of+XYZ+Street+Address2:+Santa+Fe+Springs+State:+California+ZipCode:+90000+Country:+United+States+" I need to extract Address1,Address2,State,ZipCode and Country from t...

1
...
102
103
104
105
106
...
207