questions about parsing | ansaurus

parsing

What is this date format called and how do I parse it?

I have a weird date format in some files I'm parsing. Here are some examples: 1954203 2012320 2010270 The first four digits are the year and the next three digits are day of year. For example, the first date is the 203rd day of 1954, or 7/22/1954. My questions are: What's this date format called? Is there a pre-canned way to parse ...

How on earth does Google Reader parse RSS?

I'm pulling hair out, i might pull a tooth out next, thats how frustrated i am. I have deleted (for the purpose of proving a point) ALL my RSS files in my wordpress site http://baked-beans.tv No matter what i edit, Google Reader reads what it wants, ie: the posts, and all it's content! So how on earth am I supposed to edit the conten...

How to write fast XML parsers in ruby?

I am using Nokogiri which works for small documents well. But for a 180 kb html file I have to increase the process stack size (via ulimit -s) and the parsing takes a long time. Let alone xpath queries. Are there faster alternatives available using a stock ruby 1.8 distribution? I am getting used to xpath, but the alternative does not ...

What goes in between { and } when writing BNF?

I'm having some trouble with BNF. I can't tell what seems to be the standard way of doing things (if there is one), and whether or not there are types like char or int or whatever already built in. However, my main problem is not understand how the part of the BNF in curly braces works. Given something like: exp : term ...

parser-generator

how to fetch custom google search results by simple html dom parser ?

Hi I have custom google search included on a html page. like http://www.*.com/search.htm?cx=partner-pub--00000000000-c77&cof=FORID%3A10&ie=ISO-8ds3-1&q=software&sa=Search&siteurl=www.*.com%2#1342 When I am using same url in browser i get results. I want to call it by simple dom html parser then it is returning blank...

converting bibtex files to html with python (maybe pybtex?)

Hi I want to parse a bibtex publications file and sort for specific fields (e.g. year) and filter certain content, to then put it on a website. I came across pybtex, which works as far as reading and parsing the bibtex file, but it is basically not documented and I can't figure out how to sort the entries. Is pybtex the way to go (how c...

How can I remove the function call ambiguity from a Lemon grammar?

I have the following lemon grammar (simplified from the real grammar): %right ASSIGN . %nonassoc FN_CALL . program ::= expression . expression ::= expression ASSIGN expression . expression ::= function_call . [FN_CALL] expression ::= IDENTIFIER . function_call ::= expression LPAREN RPAREN . [FN_CALL] I'm not able to fix the shift-r...

Preserving Escaped Characters in Python XML Parsing

Hello, I'm trying to write a python script that takes in one or two xml files and outputs one or two new files based on the contents of the input files. I was trying to write this script using the minidom module. However, the input files contain a number of instances of the escape character inside node attributes. Unfortunately, in...

escaped-characters

IIS Log Analysing -IP address and status Codes

Hi All, Is there any way to find out what all status codes a host got when tried to access the particular website. Something like 28-10-2010 192.168.1.1 HTTP 404 http://localhost/BAC/default.aspx 28-10-2010 192.168.1.10 HTTP 200 //localhost/BAC/default2.aspx1 I tried using some free log analysers like : IIS Log Analyser,I...

XSLT parse text node as XML?

In the middle of an XML document I'm transforming, there is a CDATA node which I know itself is composed of XML. I would like to have that "recursively parsed" as XML so that I can transform it too. Upon searching, I think my question is very similar to http://stackoverflow.com/questions/1927522/handling-node-with-inner-xml-in-xslt. T...

What is the purpose of Parse::CPAN::Authors?

What is the purpose of the Parse::CPAN::Authors module? use Parse::CPAN::Authors; # must have downloaded my $p = Parse::CPAN::Authors->new("01mailrc.txt.gz"); # either a filename as above or pass in the contents of the file my $p = Parse::CPAN::Authors->new($mailrc_contents); my $author = $p->author('LBROCARD'); # $a is ...

Problem with skipping empty cells while importing data from .xlsx file in asp.net c# application

Hi to all. I have a problem with reading .xlsx files in asp.net mvc2.0 application, using c#. Problem occurs when reading empty cell from .xlsx file. My code simply skips this cell and reads the next one. For example, if the contents of .xlsx file are: FirstName LastName Age John 36 They will be read as: First...

please extract a bit of info from this string (without regex so that i can understand it)

Hi, On my web app, I take a look at the current URL, and if the current URL is a form like this: http://www.domain.com:11000/invite/abcde16989/root/index.html -> All I need is to extract the ID which consists of 5 letters and 5 numbers (abcde16989) in another variable for further use. So I need this: var current_url = "the whole p...

Error Parsing XML for android:drawable

Hi I am facing a problem I want my application to pick up resources from the framework. Here is my code snippet of an xml. For this to be achieved following changes were made in attrs.xml and themes.xml at the framework level @android:drawable/btn_minus_ss The drawable btn_minus_ss.png is added to drawable-hdpi folder at ...

How do I get around not being able to parse a table name into a python sqlite query?

I have a python3 program that I'm making which uses a sqlite database with several tables, I want to create a selector module to allow me to chose which table to pull data from. I have found out that I can't use paramater substitution for a table name as shown bellow, so I'm looking for some alternative methods to accomplish this. c.ex...

HtmlAgilityPack: Convert parsed Javascript string to JSON

Hello! So, I am using the HtmlAgility pack (http://htmlagilitypack.codeplex.com/) to parse a script node and then I use regular expressions to parse out an object definition. The string I end up with is plain javascript that defines an object. Here is the sample Javascript I am trying to parse:  <...

htmlagilitypack

How to efficiently implement an immutable graph of heterogenous immutable objects in C++?

I am writing a programming language text parser, out of curiosity. Say i want to define an immutable (at runtime) graph of tokens as vertices/nodes. These are naturally of different type - some tokens are keywords, some are identifiers, etc. However they all share the common trait where each token in the graph points to another. This pro...

compile-time-constant

Parsing semi-structured data - can I use any classifiers?

I've got a set of documents which have a semi-regular format. Rows are typically separated by new line characters, and the main components of each row are separated by spaces. Some examples are a set of furniture assembly instructions, a set of table of contents, a set of recipes and a set of bank statements. The problem is that each s...

semi-structured

parse content with PHP

hello. I'm trying to use uTorrent webUI API. I think this is a pretty n00b question but there's little documentation about this API on the web, sorry. my server uses file_get_contents($url) and I get the data I want. but in a format I do not understand. for example: { "build": BUILD NUMBER (integer), "label": [ [ ...

Programatically rip text from a PDF File (by hand) - Missing some text

Sample PDF file that I cannot parse (2.6MB Zip File) Note: I am not interested in using a parsing library. This is for my own entertainment. I've been experimenting with ripping text out of PDF files for a search gizmo, but am unable to extract text from some pdf files. Note that this is a much easier problem than straight up parsing...

language-agnostic

1
...
203
204
205
206
207