parsing

best way to parse plain text file with a nested information structure

The text file has hundreds of these entries (format is MT940 bank statement) {1:F01AHHBCH110XXX0000000000}{2:I940X N2}{3:{108:XBS/091502}}{4: :20:XBS/091202/0001 :25:5887/507004-50 :28C:140/1 :60F:C0914CHF7789, :61:0912021202D36,80NTRFNONREF//0887-1202-29-941 04392579-0 LUTHY + xxx, ZUR :86:6034?60LUTHY + xxxx, ZUR vom 01.12.0...

How to parse an Excel file using PHP

Can anybody point me to a reference or just explain about how to parse an Excel file in general? ...

Prevent backslash from being parsed by javascript for a string

A Flash AS3 IRC application sends me a string like "f\reak" to my javascript. Irc allows the \ in usernames which poses a problem when its passed to javascript. "f\reak" become "feak" in javascript making the \r into a carriage return. Is there a way to read the absolute value of the string instead of parsing a carriage return? These don...

Good grammar for date data type for recursive descent parser LL(1)

I'm building a custom expression parser and evaluator for production environment to provide a limited DSL to the users. The parser itself as the DSL, need to be simple. The parser is going to be built in an exotic language that doesn't support dynamic expression parsing nor has any parser generator tools available. My current decision ...

Using Regular Expressions to parse functions from a string in PHP

Is there a logical way to strip out function names and their arguments using Regular Expressions in PHP? Currently I have it splitting line by line so that I am able to have each function on each line for easier markup. So: doSomeFunction(arg, moreargs, stuff); breakSomething(); ini_set(include_path, /home/htdocs/); becomes array([0...

json parsing problems

Hi I have the following Json : "{\"doc\":{\"info\":{\"allowDistribution\":\"true\",\"allowSearch\":\"true\",\"calaisRequestID\":\"67a02f61-7e45-cfc4-1276-e123c5f7422f\",\"externalID\":\"\",\"id\":\"http://id.opencalais.com/dBo1YRiQeqS-kfO-m9UeWA\",\"docId\":\"http://d.opencalais.com/dochash-1/8edabb36-eece-3f67-b187-ab64cd885ecb\",\...

Trying to parse xml, but xmldocument.loadxml() is trying to download?

I have a string input that i do not know whether or not is valid xml. I think the simplest aprroach is to wrap new XmlDocument().LoadXml(strINPUT); In a try/catch. The problem im facing is, sometimes strINPUT is an html file, if the header of this file contains <!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Transitional//EN"" ""ht...

Comprehensive and well maintained wiki syntax Parser for PHP

I'm looking for a comprehensive and well maintained wiki syntax Parser for PHP, does anybody know of one? I can find some really good parsers for markdown and bbcode but am having trouble with finding a decent wiki parser. I prefer markdown myself, but I'm writing post functions for a CMS and I'd like to give end-users a choice. I thou...

XML parsing with SAX | how to handle special characters?

We have a JAVA application that pulls the data from SAP, parses it and renders to the users. The data is pulled using JCO connector. Recently we were thrown an exception: org.xml.sax.SAXParseException: Character reference "&#00" is an invalid XML character. So, we are planning to write a new level of indirection where ALL special/ill...

Parsing atom/rss feed containing multiple <link> tags with Haml on RoR

So, firstly, here's an Atom feed snippet which I am trying to parse: // http://somelink.com/atom <feed xmlns="http://www.w3.org/2005/Atom"&gt; <entry> <title>Title Here</title> <link href="http://somelink.com/link1&amp;amp;amp;ref=rss" rel="alternate" /> <link href="http://somelink.com/link2&amp;amp;amp;re...

parsing issue with comma separated csv file

I am trying to extract 4th column from csv file (comma separated, and skipping first 2 header lines) using this command, awk 'NR <2 {next}{FS =","}{print $4}' filename.csv | more However, it doesn't work because the first column cantains comma, thus 4th column is not really 4th. Below is an example of a row: "sdfsdfsd, sfsdf", 454,...

How to read several lines from text file into memory?

I have a file with the structure like so: http://gamedev.pastebin.com/8iESYTVY but it's much bigger in size, 233MB, how can I read blocks of lines, enough lines to represent 10MB, into memory so I won't have to read in the whole file? ...

RUBY Nokogiri CSS HTML Parsing

I'm having some problems trying to get the code below to output the data in the format that I want. What I'm after is the following: CCC1-$5.00 CCC1-$10.00 CCC1-$15.00 CCC2-$7.00 where $7 belongs to CCC2 and the others to CCC1, but I can only manage to get the data in this format: CCC1-$5.00 CCC1-$10.00 CCC1-$15.00 ...

How to parse POST data in a CGI script with BASH scripting?

Hi everybody I Have a cgi script written with bash and i have to read a POST variable sent to this file. I am not good at bash scripting so i really need this help. From a php script I send a POST variable named log_message to this cgi but i don't know how to parse the POST var from the header. Any help? ...

How to get Nokogiri to ignore HTML elements that doesn't exist

any idea how i can get the code below to produce this output? 1 - 2 - B i'm getting this error "undefined method `text' for nil:NilClass (NoMethodError)", because i think table 1 does not have the element 'td class=r2' in it. require 'rubygems' require 'nokogiri' require 'open-uri' doc = Nokogiri::HTML.parse(<<-eohtml) <table cla...

How to implement a graph-structured stack?

Ok, so I would like to make a GLR parser generator. I know there exist such programs better than what I will probably make, but I am doing this for fun/learning so that's not important. I have been reading about GLR parsing and I think I have a decent high level understanding of it now. But now it's time to get down to business. The gr...

Parsing number value from SEF url in PHP

I have SEF urls like /sitename/section/12-news-title and number 12 is id of the news. I need to get that number 12 in my php code, is there any out of the box method in PHP for this? ...

If Html File Has No Ending "/tr" Tag OR "/td" Tag Then HTML Agility Pack Does Not Read That Information Perfectly.

I am using HTML Agility Pack to parse html content. I am using parsing to extract table information. It works. But if there is no ending "/tr" tag or "/td" tag then it does not parse that information perfectly.(in which there is no ending tr tag or td tag.) Like <html> <head> <meta name="generator" content= "HTML Tidy for...

Interpret a rule applying multiple xpath queries on multiple XML documents

Hi, I need to build a component which would take a few XML documents in input and check the following kind of rules: XML1:/bookstore/book[price>35.00] != null and (XML2:/city/name = 'Montreal' or XML3://customer[@language] contains 'en') Basically my component should be able to: substitute the XML tokens with the correspondin...

C# - Google like query engine.-

Guys; Hope you are fine. I have to make a Web Project (very simple) I will have a DB with 2 tables. One table has 2 fields. From the WebPage I need a Google like search query, for example I have Movie Title and Movie Review on the Table. I need to be able to search those 2 fields like this: "Best Movie" + Action I will need to...