parsing

Parsing HTML document: Regular expression or LINQ?

Trying to parse an HTML document and extract some elements (any links to text files). The current strategy is to load an HTML document into a string. Then find all instances of links to text files. It could be any file type, but for this question, it's a text file. The end goal is to have an IEnumerable list of string objects. That par...

Using the TParser in the Classes unit to parse a filter string

I want to parse a filter string similar to the following: ((Field1 = 'red') and (field2 = 2)) or (Field3 between 1 and 5) or (field4 in ['up', 'down']) I'd like to use the TParser in the Classes unit, but there does not seem to be much documentation or examples on it. ...

PHP explode() not finding delimeter

I'm building a blog that should parse bbcode tags like this: Input: <youtube=http://www.youtube.com/watch?v=VIDEO_ID&amp;feature=channel&amp;gt; Output: <object width="400" height="245"> <param name="movie" value="http://www.youtube- nocookie.com/v/VIDEO_ID&hl=en&fs=1&rel=0&showinfo=0"></param> <param name="allowFullScreen" value=...

How to parse a large HTML file with Java HTMLParser library

I have some html files created by Filemaker export. Each file is basically a huge HTML table. I want to iterate through the table rows and populate them into a database. I have tried to do it with HTMLParser as follows: String inputHTML = readFile("filemakerExport.htm","UTF-8"); Parser parser = new Parser(); parser.setInputHTML(inputHTM...

Problem with a shift-reduce conflict in my grammar

I'm trying to write a small parser with Irony. Unfortunately I get a "shift-reduce conflict". Grammars are not my strong point, and I only need to get this one small thingy done. Here's the reduced grammar that produces the error: ExpressionTerm := "asd" LogicalExpression := ExpressionTerm | LogicalExpression "AND" LogicalExpres...

Parsing Huge XML Files in PHP

I'm trying to parse the dmoz content/structures xml files into mysql, but all existing scripts to do this are very old and don't work well. How can I go about opening a large (+1GB) xml file in php for parsing? ...

How do I easily parse a URL with parameters in a Rails test?

I have a some code that embeds a return_to URL into a redirect (like OpenID) that I want to test: def test_uses_referrer_for_return_to expected_return_to = 'http://test.com/foo' @request.env['HTTP_REFERER'] = expected_return_to get :fazbot # @response.redirected_to looks like http://service.com?...&amp;return_to=[URI-encoded ver...

Design strategy for a simple code parser

I'm attempting to write an application to extract properties and code from proprietary IDE design files. The file format looks something like this: HEADING { SUBHEADING1 { PropName1 = PropVal1; PropName2 = PropVal2; } SUBHEADING2 { { 1 ; PropVal1 ; PropValue2 } { 2 ; PropVal1 ; PropValue2 ; OnEvent1=BEGIN ...

what's the most easy-to-parse format for PHP?

In my PHP file,I'm reading out bulk of information using query like below: SELECT GROUP_CONCAT(CONCAT('<comment><body><![CDATA[',body,']]></body>','<replier>',if(screen_name is not null and !anonymous,screen_name,''),'</replier>','<created>',created,'</created></comment>') SEPARATOR '') FROM idiscussion LEFT JOIN ...

Rails library to process an RSS/ATOM feed?

What's a good solution to parse an RSS/ATOM feed and present the content in a Rails view? ...

Speed and XML Parsing in .NET - Serialization vs XML DOM vs ?

I have done XML parsing before but never on a massive scale. If I'm working with many documents similar to this format: <?xml version="1.0" ?> <items comment="something..."> <uid>6523453</uid> <uid>94593453</uid> </items> What is the fastest way to parse these documents? 1) XML DOM 2) XML Serialize - Rehydrate to a .NET Object 3) ...

How to parse badly formed XML in Java?

I have XML that I need to parse but have no control over the creation of. Unfortunately it's not very strict XML and contains things like: <mytag>This won't parse & contains an ampersand.</mytag> The javax.xml.stream classes don't like this at all, and rightly error with: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[...

Learning parser in python

I recall I have read about a parser which you just have to feed some sample lines, for it to know how to parse some text. It just determines the difference between two lines to know what the variable parts are. I thought it was written in python, but i'm not sure. Does anyone know what library that was? ...

Parsing strangely formatted files

I need to parse a file but the data is in a strange format that I'm not familar parsing. The data is always formatted like this. The field name is to the left and the data is right of the "=" and all fields are always in this order. File Data: Report 1 of 1 job_name = JOBNAME job_no = JOB99999 job_id = 6750 rprt_id = 27811 rprt_name ...

Parsing Source Code - Unique Identifiers for Different Languages?

Hello, I'm building an application that receives source code as input and analyzes several aspects of the code. It can accept code from many common languages, e.g. C/C++, C#, Java, Python, PHP, Pascal, SQL, and more (however many languages are unsupported, e.g. Ada, Cobol, Fortran). Once the language is known, my application knows what ...

Parse a String in C

Using just C I would like to parse a string and a) count the occurrences of a character in a string (so i.e. count all the e's in a passed in string) b) Once counted (or even as I am counting) replace the e's with 3's Thanks. ...

how to parse windows inf files for python ?

hi please help me. example inf file : ;============================================================================= ; ; Copyright (c) Intel Corporation (2002). ; ; INTEL MAKES NO WARRANTY OF ANY KIND REGARDING THE CODE. THIS CODE IS ; LICENSED ON AN "AS IS" BASIS AND INTEL WILL NOT PROVIDE ANY SUPPORT, ; ASSISTANCE, INSTALLATION, TR...

XmlSlurper - list text and regular nodes of xhtml document

I am using Groovy's XmlSlurper to parse xhtml document (or sudo xhthml one), and I'm trying to get to the text nodes of the document but can't figure how, here is the code: import groovy.util.* xmlText = ''' <TEXTFORMAT INDENT="10" LEADING="-5"> <P ALIGN="LEFT"> <FONT FACE="Garamond Premr Pro" SIZE="20" COLOR="#001200" LETTERSPA...

Parsing really big log files (>1Gb, <5Gb)

Hello, I need to parse very large log files (>1Gb, <5Gb) - actually I need to strip the data into objects so I can store them in a DB. The log file is sequential (no line breaks), like: TIMESTAMP=20090101000000;PARAM1=Value11;PARAM2=Value21;PARAM3=Value31;TIMESTAMP=20090101000100;PARAM1=Value11;PARAM2=Value21;PARAM3=Value31;TIMESTAMP=2...

Code Golf: Evaluating Mathematical Expressions

Challenge Here is the challenge (of my own invention, though I wouldn't be surprised if it has previously appeared elsewhere on the web). Write a function that takes a single argument that is a string representation of a simple mathematical expression and evaluates it as a floating point value. A "simple expression" may in...