questions about parsing | ansaurus

parsing

what is regular expression?

I know this question seems stupid, but it isn't. I mean what is it exactly. I have a fair understanding of the parsing problem. I know BNF/EBNF, I've written grammar to parse simple context-free languages in one of my college courses. I just never met regular expressions before! The only thing that I remember about it is that context-fre...

How to test a CSS parser?

I'm writing a parser to parse CSS. I started by modifying the CSS reference grammar, to use whichever grammar and lexer syntax are supported by the 3rd-party parser generator tool which I'm using. I think that I've finished coding the grammar: the parser-generator is able now to generate state transition tables for/from my grammar. Th...

The correct Javascript Date.parse(...) format string?

What is a culture-invariant way of constructing a string such that the Javascript Date() constructor can parse it and create the proper date object? I have tried these format strings which don't work (using C# to generate the strings): clientDate.ToString(); // gives: "11/05/2009 17:35:23 +00:00" clientDate.ToString("MMM' 'dd', 'yyyy'...

string-formatting

How to replace URLs of links using Java HTMLParser (org.htmlparser)

I am using htmlparser (htmlparser.org) to re-write all the link's in a input String. All i need to do is iterate over all the link tags (<a href=...), that appear in the input String, grab their value, perform some regex to determine how they should be manipulated, and then update the link's href, target and onclick values accordingly. ...

Parsing XML in Cocoa

Hi Everyone: Today I am looking into how to make a simple XML parser in Cocoa (for the desktop). I am thinking of using NSXMLParser to parse the data, but am not quite sure where to start. The XML file on the web doesn't have the much data in it, just a simple listing with a few things that I need to save into a variable. Does anyone...

How can I parse marked up text for further processing?

See updated input and output data at Edit-1. What I am trying to accomplish is turning + 1 + 1.1 + 1.1.1 - 1.1.1.1 - 1.1.1.2 + 1.2 - 1.2.1 - 1.2.2 - 1.3 + 2 - 3 into a python data structure such as [{'1': [{'1.1': {'1.1.1': ['1.1.1.1', '1.1.1.2']}, '1.2': ['1.2.1', '1.2.2']}, '1.3'], '2': {}}, ['3',]] I've looked ...

Will I use HtmlDocument even I want to parse the HTML string using HtmlAglityPack ?

Hi everyone, I'm working in C#. I'm trying to extract the first instance of img tag from a HTML string (which is actually a post data). This is my code: private string GrabImage(string htmlContent) { String firstImage; HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); htmlDoc.LoadHtml(htmlConten...

Parsing wikimedia markup - are EBNF-based parsers poorly suited?

I am attempting to parse (in Java) Wikimedia markup as found on Wikipedia. There are a number of existing packages out there for this task, but I have not found any to fit my needs particularly well. The best package I have worked with is the Mathclipse Bliki parser, which does a decent job on most pages. This parser is incomplete, how...

Parsing source of a webpage with Objective-C

Is there a way to parse a website's source on the iPhone to get the URL's of photos on that page? If so how would you do that? Thanks ...

How do I use Perl to parse Twitter XML?

I'm using cURL to get the XML file for my Twitter friend's timeline. (API here.) Currently (though I'd be open for more suggestions) I am using Perl to parse the XML. This is my first time using Perl and I really don't know what I am doing. Currently this is my code: #!/usr/bin/perl # use module use XML::Simple; use Data::Dumper; # ...

Print XML tag names and values in Java

I have an XML document, and I want to print the tag names and values (of leaf nodes) of all tags in the document. For example, for the XML: <library> <bookrack> <book> <name>Book1</name> <price>$10</price> </book> <book> <name>Book2</name> <price>$15</price> </book> </bookrack> </library> T...

Regexkit lite and iPhone parsing

I've taken the suggestion of some posts here that recommend regexkit lite with a problem I am having with trying to extract a particular URL from a string. The problem is that I'm very lost with the syntax of using it and hoping someone that has used it can give me a hand. The string i'm trying to parse looks someting like this: <a> bl...

JavaScriptSerializer.Deserialize - how to change field names

Summary: How do I map a field name in JSON data to a field name of a .Net object when using JavaScriptSerializer.Deserialize ? Longer version: I have the following JSON data coming to me from a server API (Not coded in .Net) {"user_id":1234, "detail_level":"low"} I have the following C# object for it: [Serializable] public class Dat...

javascriptserializer

Term extraction: Generatings tags out of text

How to get the same results as http://developer.yahoo.com/search/content/V1/termExtraction.html This question has been asked quite a few times before. http://stackoverflow.com/questions/1078766/best-approach-to-analyze-text-in-php http://stackoverflow.com/questions/711062/what-is-a-good-keyword-extraction-web-service http://stackoverf...

Where does the compiler spend most of its time during parsing ?

I read in Sebesta book, that the compiler spends most of its time in lexing source code. So, optimizing the lexer is a necessity, unlike the syntax analyzer. If this is true, why lexical analysis stage takes so much time compared to syntax analysis in general ? I mean by syntax analysis the the derivation process. ...

language-agnostic

lexical-analysis

How should I parse a fixed length record file in Ruby?

Hi there, I was wondering if anyone had any advice on parsing a file with fixed length records in Ruby. The file has several sections, each section has a header, n data elements and a footer. For example (This is total nonsense - but has roughly similar content) 1923 000-230SomeHeader 0303030 209231-231992395 MoreData 2938...

A "smart" (forgiving) date parser?

I have to migrate a very large dataset from one system to another. One of the "source" column contains a date but is really a string with no constraint, while the destination system mandates a date in the format yyyy-mm-dd. Many, but not all, of the source dates are formatted as yyyymmdd. So to coerce them to the expected format, I do (...

Parse multimedia files out of an HTML page (any language)

Given an HTML page I would like to get all the 'x' files that are embedded in the HTML file or are linked by it, where 'x' equals: Images (JPG,PNG,GIF...) Documents (Word, PowerPoint, PDF...) Flash (.flv, .swf) How do I do this? So images are easy to extract because they are either linked to with a link ending in a (.png|.jpg|....)...

embedded-resource

StreamReader.ReadLine() starting from the end of the stream

Hi, I'm working in C#/.NET and I'm parsing a file to check if one line matches a particular regex. Actually, I want to find the last line that matches. To get the lines of my file, I'm currently using the System.IO.StreamReader.ReadLine() method but as my files are very huge, I would like to optimize a bit the code and start from the...

Parsing CSV files with escaped newlines in Ruby?

How do I parse CSV files with escaped newlines in Ruby? I don't see anything obvious in CSV or FasterCSV. Here is some example input: "foo", "bar" "rah", "baz \ and stuff" "green", "red" In Python, I would do this: csvFile = "foo.csv" csv.register_dialect('blah', escapechar='\\') csvReader = csv.reader(open(csvFile), "blah") ...

1
...
46
47
48
49
50
...
207