questions about parsing | ansaurus

parsing

Extracting ALL matches of a nested regular expression in python

I am trying to parse a list of items which satisfies the python regex r'\A(("[\w\s]+"|\w+)\s+)*\Z' that is, it's a space separated list except that spaces are allowed inside quoted strings. I would like to get a list of items in the list (that is of items matched by the r'("[\w\s]+"|\w+)' part. So, for example >>> parse('foo "bar ...

How to Parse Some Wiki Markup

Hey guys, given a data set in plain text such as the following: ==Events== * [[312]] – [[Constantine the Great]] is said to have received his famous [[Battle of Milvian Bridge#Vision of Constantine|Vision of the Cross]]. * [[710]] – [[Saracen]] invasion of [[Sardinia]]. * [[939]] – [[Edmund I of England|Edmund I]] succ...

parsing multiple lines in 1 line in XSLT

<xml> <data> <Attribute name='forms'> <List> <String>xform</String> <String>yform</String> </List> </Attribute> </data> </xml> How would I set my xslt to get all the values in the List. So I would like to output both values in 1 line seperated by |. For ex. xform|yform ...

XSLT parsing multiple lines

<data> <Attributes> <Attribute name='somethingelse' value='blah'/> <Attribute name='forms'> <List> <String>xform</String> <String>yform</String> </List> </Attribute> </Attributes> </data> I am already parsing the xslt at Attributes level, so I can get the value blah by just doing <xsl:value-of select="Attribute[...

Innerhtml position

On an html-page I have from 0-4 divs with a specific class name. What I want to do is get the html from the start to the first div, then from div1 position to div2 position, then div2 to div3, div3 to div4, and lastly div4 to end html. Ive managed to do this with html.substring(0, div1.innerhtmlPos) , html.substring(div1End, div2.inner...

htmlagilitypack

Parsing xls with groups

Hi! Can you tell me how to parse xls file, which contains groups(outlines) using ODBC or COM ...

Iterating through/Parsing JSON Object via JavaScript

Hello, I'm having a problem with jQuery/Ajax/JSON. I'm using a jQuery ajax post like so... $.ajax({ type: "POST", dataType: "json", url: "someurl.com", data: "cmd="+escape(me.cmd)+"&q="+q+"&"+me.args, success: function(objJSON){ blah blah... } }); It's my understanding that this will return a JavaScript JSON object? Th...

a class for a parser.. in the same way that class Regex is for regular expressions

I find the class Regex in .net extremely useful (for both matching and matching/replacing). There are some patterns that cannot be specified in regular expressions, but rather need a little grammar. Is there a library for parsers that DO NOT require code generation (like ANTLR)... but where I can specify the syntax in my code on the fly?...

Can I parse an HTML using XSLT?

Hi I have to parse a big HTML file, and Im only interested in a small section (a table). So I thought about using an XSLT to simplify/transform the HTML in something simpler that I could then easily process. The problem Im having is that the is not finding my table. So I don't know if its even possible to parse HTML using a XSL styles...

How to write a recursive descent parser from scratch?

As a purely academic exercise, I'm writing a recursive descent parser from scratch -- without using ANTLR or lex/yacc. I'm writing a simple function which converts math expressions into their equivalent AST. I have the following: // grammar type expr = | Lit of float | Add of expr * expr | Mul of expr * expr | Div of ex...

How to handle a tokenize error with unterminated multiline comments (python 2.6)

The following sample code: import token, tokenize, StringIO def generate_tokens(src): rawstr = StringIO.StringIO(unicode(src)) tokens = tokenize.generate_tokens(rawstr.readline) for i, item in enumerate(tokens): toktype, toktext, (srow,scol), (erow,ecol), line = item print i, token.tok_name[toktype], toktext...

Objective C error during parsing JSON

What does this error mean? initializer is not constant Thank you for your help ...

best way to parse a line in python to a dictionary

I have a file with lines like account = "TEST1" Qty=100 price = 20.11 subject="some value" values="3=this, 4=that" There is no special delimiter and each key has a value that is surrounded by double quotes if its a string but not if it is a number. There is no key without a value though there may exist blank strings which are represen...

%union directive in Bison

Hello, I was trying to use an abstract syntax tree in a bison parser, so I tried to use %union directive. Grammar file looks like this: %{ #include "compiler.h" #include "ast.h" #include "common.h" static bool verbose = true; extern "C" { int cyylex(void); void cyyerror(const char *s); } %} %union { ast_node *node; ...

PHP removing text from preg_match result

Hello I am using preg_match to parse for data, it works 99% of the time but sometimes it gives me a result like: $match[1] = <a href="example">text i want</a> when what I really want is the "text i want" string. I am looping preg match and 99% of the time $match[1] gives me the text string i want but I want to implement something into...

Hpricot looping with index ?

Hello, I have the following HTML doc : <ul> <li><span>Some text</span></li> <li><span>Some other text</span></li> <li><span>Some more text</span></li> </ul> How can I use Hpricot to loop on the list items and insert some new HTML at the beginning of each, so that I get the following : <ul> <li><span>1</span><span>Some text</...

Preserving leading white space while reading>>writing a file line by line in bash

I am trying to loop through a directory of text files and combine them into one document. This works great, but the text files contain code snippets, and all of my formatting is getting collapsed to the left. All leading whitespace on a line is stripped. #!/bin/sh OUTPUT="../best_practices.textile" FILES="../best-practices/*.textile" fo...

How do i parse JSON strings using Django template library.

How do i parse JSON strings using Django template library. I can parse it using javascript in my template, but I would like to parse the json in the template library when it is rendered by the server. Is there a way to do it. - Amey Kanade ...

How can I extract data from HTML tables in Perl?

Possible duplicate: Can you provide an example of parsing HTML with your favorite parser? How can I extract content from HTML files using Perl? I'm trying to use regular expressions in Perl to parse a table with the following structure. The first line is as follows: <tr class="Highlight"><td>Time Played</a></td><td></td><td>Artist</...

Python parsing bracketed blocks

What would be the best way in python to parse out chunks of text contained in matching brackets? "{ { a } { b } { { { c } } } }" should initially return: [ "{ a } { b } { { { c } } }" ] putting that as an input should return: [ "a", "b", "{ { c } }" ] which should return: [ "{ c }" ] [ "c" ] [] ...

1
...
73
74
75
76
77
...
207