parsing

Difference betwean RegexpParsers,StandardTokenParsers and JavaTokenParsers in scala

I am learning Parser Combinators in scala and seeing different ways of parsing.I mainly see three different kind of parsers ie.RegexpParsers,StandardTokenParsers and JavaTokenParsers.I am new to parsing and not getting the idea how we will choose the suitable Parser according to our requirement.Can any one please explain how these differ...

XML Parsing C++

Hi All, I am using MSXML2 for XML Parsing. I am using IXMLDOMNodePrtr , IXMLDOMDocumentPtr ans so on for XML Parsing. I have observed that while my application ends Private Bytes usage of EXE is say 30 MB and it keep on reducing. Previously i was thinking that their might be some memory leaks there. I have different tolls for the same a...

NSXMLParser, Issue with ASCII Character Set

Hi all <Feeds> <channel> <ctitle>YouTube</ctitle> <cdescription>YouTube - Recently added videos</cdescription> <items> <recentlyAdded> <item> <serverItemId>1</serverItemId> <title>Fan Video CARS</title> <author>mikar1</author> <guid isPermaLink='false'></guid> <link>http://www....

Javascript BBCode Parser recognizes only first list element

I have a really simple Javascript BBCode Parser for client-side live preview (don't want to use Ajax for that). The problem ist, this parser only recognizes the first list element: function bbcode_parser(str) { search = new Array( /\[b\](.*?)\[\/b\]/, /\[i\](.*?)\[\/i\]/, /\[img\](.*?)\[\/img\]/, /\[url\="?(.*?...

Linux: shell builtin string matching

I am trying to become more familiar with using the builtin string matching stuff available in shells in linux. I came across this guys posting, and he showed an example a="abc|def" echo ${a#*|} # will yield "def" echo ${a%|*} # will yield "abc" I tried it out and it does what its advertised to do, but I don't understand what the...

Java parsing Error

I was trying to parse the string: Portfolio1[{Exchange:NASDAQ-Symbol:INFY-Full Name:Infosys Technologies Limited (ADR)-Share Count:100.0-Percent Gain:388.2258065-The position cost is:1240.0 USD-This position made today:-46.9997 USD-This position has a total gain of:4814.0 USD-This position is worth:6054.0 USD}--{Exchange:NASDAQ-Symbo...

XML Parsing in Groovy strips attribute new lines

I'm writing code where I retrieve XML from a web api, then parse that XML using Groovy. Unfortunately, it seems that both XmlParser and XmlSlurper for Groovy strip newline characters from the attributes of nodes when .text() is called. How can I get at the text of the attribute including the newlines? Sample code: def xmltest = ''' ...

Resources for character and text processing (encoding, regular expressions, NLP)

I'd like to learn foundations of encodings, characters and text. Understanding these is important for dealing with a large set of text whether that are log files or text source for building algorithms for collective intelligence. My current knowledge is pretty basic: something like "As long as I use UTF-8, I'm okay." I don't say I need ...

How to parse phpDoc style comment block with php?

Please consider the following code with which I'm trying to parse only the first phpDoc style comment (noy using any other libraries) in a file (file contents put in $data variable for testing purposes): $data = " /** * @file A lot of info about this file * Could even continue on the next line * @author [email protected] *...

ANTLR - Embedding Java code, evaluate before or after?

Hello all, I'm writing a simple scripting language on top of Java/JVM, where you can also embed Java code using the {} brackets. The problem is, how do I parse this in the grammar? I have two options: Allow everything to be in it, such as: [a-z|a-Z|0-9|_|$], and go on Get an extra java grammar and use that grammar to parse that small ...

How should I handle searching through byte arrays in Java?

Preliminary: I am writting my own httpclient in Java. I am trying to parse out the contents of chunked encoding. Here is my dilema: Since I am trying to parse out chunked http transfer encoding with a gzip payload there is a mix of ascii and binary. I can't just take the http resp content and convert it to a string and make use of Strin...

Need some ideas on how to acomplish this in Java (parsing strings)

Sorry I couldn't think of a better title, but thanks for reading! My ultimate goal is to read a .java file, parse it, and pull out every identifier. Then store them all in a list. Two preconditions are there are no comments in the file, and all identifiers are composed of letters only. Right now I can read the file, parse it by spaces,...

need to open an image open in web browser

byte.eml file is having image base64 encoded value ..and i am tring to open it in browser ...but this is not populating image file....plz help me out.. this is code... Dim oFile As System.IO.File Dim orEAD As System.IO.StreamReader orEAD = oFile.OpenText("E:\mailbox\P3_hemantd.mbx\byte.eml") Dim content As String content = "" ''...

Which rdfa parser for java that supports currently used rdfa attributes?

I am building an app in Java using Jena for semantic information scraping. I am looking for a RDFa parser that would allow me to correctly extract all the RDFa statements. Specifically, one that extracts info about namespaces used and presuming that RDFa tags are correct in the page produces correct triples, ones that distinguish between...

removing phone number from a document.

Hi, I've got a challenge that I am hoping that the SO community is able to help me with. I trying to parse a lot of html documents in my PHP application to remove personal details, such as names, addresses and phone numbers. I can remove most of these details without too much trouble, however the phone number is a real problem for me....

Parsing returned array in javascript

I'm making a call to PayPal's credit card processor, and after a successful/unsuccessful transaction it returns me a string that looks like this: DoDirectPayment failed: Array ( [TIMESTAMP] => 2010%2d05%2d02T23%3a33%3a28Z [CORRELATIONID] => 8c503f5c6c861 [ACK] => Failure [VERSION] => 51%2e0 [BUILD] => 1268624 [L_ERRORCODE0] => 10527 [L_...

In Ruby Compare 2 lines in a log file which BOTH contain the SAME "WORD" but ONLY print out the line that was written LASTLY

here are sample lines Apr 9 11:53:26 skip [2244]: [2244] ab-cd-ef:cc [INFO] A recoverable error has occurred some other log lines .. .... Apr 9 12:53:26 skip [2244]: [2244] ab-cd-ef:cc [INFO] A recoverable error has occurred now the LATEST line would have to be one with the latest Date String, and THAT is the one that needs to be pr...

Display using QtWebKit, whilst parsing xml

I wish to use QtWebKit to load a url for display, but, that's the easy part, I can do that. What I wish to do is record / log xml as I go. My attention here is to record and database certain details on the fly, by recording those details. My problem is, how to do this all on the fly, without requesting the same url from the server twice...

String Parsing in C#

What is the most efficient way to parse a C# string in the form of "(params (abc 1.3)(sdc 2.0)(www 3.05)....)" into a struct in the form struct Params { double abc,sdc,www....; } Thanks EDIT The structure always have the same parameters (same names,only doubles, known at compile time).. but the order is not granted.. only one s...

get JSON object attribute name

I know that I can retrieve "session" by using item.fields.name but what if I don't know in advance that the attribute is called "name". How can I retrieve the list of the attributes names in fields first. [ { "pk": 2, "model": "auth.group", "fields": { "name": "session" } } ] ...