I know this question seems stupid, but it isn't. I mean what is it exactly. I have a fair understanding of the parsing problem. I know BNF/EBNF, I've written grammar to parse simple context-free languages in one of my college courses. I just never met regular expressions before! The only thing that I remember about it is that context-fre...
I'm writing a parser to parse CSS.
I started by modifying the CSS reference grammar, to use whichever grammar and lexer syntax are supported by the 3rd-party parser generator tool which I'm using.
I think that I've finished coding the grammar: the parser-generator is able now to generate state transition tables for/from my grammar.
Th...
What is a culture-invariant way of constructing a string such that the Javascript Date() constructor can parse it and create the proper date object?
I have tried these format strings which don't work (using C# to generate the strings):
clientDate.ToString();
// gives: "11/05/2009 17:35:23 +00:00"
clientDate.ToString("MMM' 'dd', 'yyyy'...
I am using htmlparser (htmlparser.org) to re-write all the link's in a input String.
All i need to do is iterate over all the link tags (<a href=...), that appear in the input String, grab their value, perform some regex to determine how they should be manipulated, and then update the link's href, target and onclick values accordingly.
...
Hi Everyone:
Today I am looking into how to make a simple XML parser in Cocoa (for the desktop). I am thinking of using NSXMLParser to parse the data, but am not quite sure where to start. The XML file on the web doesn't have the much data in it, just a simple listing with a few things that I need to save into a variable. Does anyone...
See updated input and output data at Edit-1.
What I am trying to accomplish is turning
+ 1
+ 1.1
+ 1.1.1
- 1.1.1.1
- 1.1.1.2
+ 1.2
- 1.2.1
- 1.2.2
- 1.3
+ 2
- 3
into a python data structure such as
[{'1': [{'1.1': {'1.1.1': ['1.1.1.1', '1.1.1.2']}, '1.2': ['1.2.1', '1.2.2']}, '1.3'], '2': {}}, ['3',]]
I've looked ...
Hi everyone,
I'm working in C#. I'm trying to extract the first instance of img tag from a HTML string (which is actually a post data).
This is my code:
private string GrabImage(string htmlContent)
{
String firstImage;
HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.LoadHtml(htmlConten...
I am attempting to parse (in Java) Wikimedia markup as found on Wikipedia. There are a number of existing packages out there for this task, but I have not found any to fit my needs particularly well. The best package I have worked with is the Mathclipse Bliki parser, which does a decent job on most pages.
This parser is incomplete, how...
Is there a way to parse a website's source on the iPhone to get the URL's of photos on that page? If so how would you do that?
Thanks
...
I'm using cURL to get the XML file for my Twitter friend's timeline. (API here.)
Currently (though I'd be open for more suggestions) I am using Perl to parse the XML. This is my first time using Perl and I really don't know what I am doing. Currently this is my code:
#!/usr/bin/perl
# use module
use XML::Simple;
use Data::Dumper;
# ...
I have an XML document, and I want to print the tag names and values (of leaf nodes) of all tags in the document.
For example, for the XML:
<library>
<bookrack>
<book>
<name>Book1</name>
<price>$10</price>
</book>
<book>
<name>Book2</name>
<price>$15</price>
</book>
</bookrack>
</library>
T...
I've taken the suggestion of some posts here that recommend regexkit lite with a problem I am having with trying to extract a particular URL from a string. The problem is that I'm very lost with the syntax of using it and hoping someone that has used it can give me a hand.
The string i'm trying to parse looks someting like this:
<a> bl...
Summary: How do I map a field name in JSON data to a field name of a .Net object when using JavaScriptSerializer.Deserialize ?
Longer version: I have the following JSON data coming to me from a server API (Not coded in .Net)
{"user_id":1234, "detail_level":"low"}
I have the following C# object for it:
[Serializable]
public class Dat...
How to get the same results as http://developer.yahoo.com/search/content/V1/termExtraction.html
This question has been asked quite a few times before.
http://stackoverflow.com/questions/1078766/best-approach-to-analyze-text-in-php
http://stackoverflow.com/questions/711062/what-is-a-good-keyword-extraction-web-service
http://stackoverf...
I read in Sebesta book, that the compiler spends most of its time in lexing source code. So, optimizing the lexer is a necessity, unlike the syntax analyzer.
If this is true, why lexical analysis stage takes so much time compared to syntax analysis in general ?
I mean by syntax analysis the the derivation process.
...
Hi there,
I was wondering if anyone had any advice on parsing a file with fixed length records in Ruby. The file has several sections, each section has a header, n data elements and a footer. For example (This is total nonsense - but has roughly similar content)
1923 000-230SomeHeader 0303030
209231-231992395 MoreData
2938...
I have to migrate a very large dataset from one system to another. One of the "source" column contains a date but is really a string with no constraint, while the destination system mandates a date in the format yyyy-mm-dd.
Many, but not all, of the source dates are formatted as yyyymmdd. So to coerce them to the expected format, I do (...
Given an HTML page I would like to get all the 'x' files that are embedded in the HTML file or are linked by it, where 'x' equals:
Images (JPG,PNG,GIF...)
Documents (Word, PowerPoint, PDF...)
Flash (.flv, .swf)
How do I do this?
So images are easy to extract because they are either linked to with a link ending in a (.png|.jpg|....)...
Hi,
I'm working in C#/.NET and I'm parsing a file to check if one line matches a particular regex. Actually, I want to find the last line that matches.
To get the lines of my file, I'm currently using the System.IO.StreamReader.ReadLine() method but as my files are very huge, I would like to optimize a bit the code and start from the...
How do I parse CSV files with escaped newlines in Ruby? I don't see anything obvious in CSV or FasterCSV.
Here is some example input:
"foo", "bar"
"rah", "baz \
and stuff"
"green", "red"
In Python, I would do this:
csvFile = "foo.csv"
csv.register_dialect('blah', escapechar='\\')
csvReader = csv.reader(open(csvFile), "blah")
...