parsing

Python extract domain name from URL

hello, how would you extract the domain name from a URL, excluding any subdomains? My initial simplistic attempt was: '.'.join(urlparse.urlparse(url).netloc.split('.')[-2:]) This works for http://www.foo.com, but not http://www.foo.com.au. Is there a way to do this properly without using special knowledge about valid TLDs or country...

User input parsing - city / state / zipcode / country

I'm looking for advice on parsing input from a user in multiple combinations of City / State / Zip Code / Country. A common example would be what Google maps does. Some examples of input would be: "City, State, Country" "City, Country" "City, Zip Code, Country" "City, State, Zip Code" "Zip Code" What would be an efficient and corre...

How To Parse XML With Invalid Characters in Node Name?

So I'm trying to parse some XML, the creation of which is not under my control. The trouble is, they've somehow got nodes that look like this: <ID_INTERNAL_FEAT_FOCUSED_EXPERTISE_(MORNINGSTAR) /> <ID_INTERNAL_FEAT_FOCUSED_EXPERTISE_(QUARTERSTAFF) /> <ID_INTERNAL_FEAT_FOCUSED_EXPERTISE_(SCYTHE) /> <ID_INTERNAL_FEAT_FOCUSED_EXPERTISE_(TR...

Executing Constantly Changing Logic

Hi, I writing a dynamic HTML parsers functionality. I will want to modify existing parsers and also would want to add more parsers (I expect parsers will be modified as sites a remodified and new parsers will be needed for new sites). I started writing a generic functionality which use a XML with conditions and rules for each site but ...

Parsing a string within a string?

I have a function which accepts a string parameter such as: "var1=val1 var2=val2 var3='a list of vals'"; I need to parse this string and pick out the var/val combination's. That is easy enough until introducing something like var3='a list of vals'. Obviously I can't explode the string into an array using a white space delimiter which ...

Parse Plist (NSString) into NSDictionary

So I have a plist structured string, that get dynamically (not from the file system). How would I convert this string to a NSDictionary. I've tried converting it NSData and then to a NSDictionary with NSPropertyListSerialization, but it returns "[NSCFString objectAtIndex:]: unrecognized selector sent to instance 0x100539f40" when I att...

What is speculative parsing ?

I've read that Firefox 3.5 has a new feature in its parser ? Improvements to the Gecko layout engine, including speculative parsing for faster content rendering. Could you explain that in simple terms. ...

php parse cxml

Hi, I'm looking to parse some CXML in PHP...basically all I'm looking to get the value of tags and attributes within it.. how can this be done... Thanks, ...

Scala: match and parse an integer string?

I'm looking for a way to matching a string that may contain an integer value. If so, parse it. I'd like to write code similar to the following: def getValue(s: String): Int = s match { case "inf" => Integer.MAX_VALUE case Int(x) => x case _ => throw ... } The goal is that if the string equals "inf", return In...

Working with foreign symbols in python

I'm parsing a JSON feed in Python and it contains this character, causing it not to validate. Is there a way to handle these symbols? Can they be converted or is they're a tidy way to remove them? I don't even know what this symbol is called or what causes them, otherwise I would research it myself. EDIT: Stackover Flow is strippin...

What do people do with Parsers, like antlr javacc?

Out of curiosity, I wonder what can people do with parsers, how they are applied, and what do people usually create with it? I know it's widely used in programming language industry, however I think this is just a tiny portion of it, right? ...

Parsing XML file using NSXMLParser

Hello I have to parse an XML file using NSXMLParser. There are so many HTML tags in them, so when I am trying to parse it, it will store the string up to that and then again go to found character method and starts to append it. my code is: if (Bio_CResults) { [BioResults appendString: string]; [Info appendString:string]; [stringar...

Advanced SAX Parser in C#.

See Below is the XML Arch. I want to display it in row / column wize. What I need is I need to convert this xml file to Hashtable like, {"form" : {"attrs" : { "string" : " Partners" } {"child1": { "group" : { "attrs" : { "col" : "6", "colspan":"1" } } { "child1": { "field" : { "attrs" : { "nam...

Header parsing + MIME

Hi, While parsing MIME using Erlang, I'm able to extract header, body and attachment. So now I have to parse all these parts separately. Header structure: Header-tag : header-value\n Example: Delivered-To: [email protected]\nReceived: by 1.gnu.geodesic.net (fdm 1.5, account "mail");\n\tFri, 03 Jul 2009 16:56:03 +0530\n so from abo...

Need to access a single div in an html field loaded into a variable in PHP

Hi guys, here is the situation. I'm retrieving a page using curl into a variable. So I now have all the HTML in one snug variable. I need to however using code access a certain DIV notes contents actually its like this - there is one div node on the page with the ID of 'image' and its kinda like this: <html> <body> .......... ...

Encoding error when using HTML Agility Pack

hi I'm trying to parse a html doc using some code I found from this actual site but I keep getting a parsing error HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument(); // There are various options, set as needed htmlDoc.OptionFixNestedTags = true; // filePath is a path to a file containin...

Parsing: Lazy initialization and mutually recursive monads in F#

I've been writing a little monadic parser-combinator library in F# (somewhat similar to FParsec) and now tried to implement a small parser for a programming language. I first implemented the code in Haskell (with Parsec) which ran perfectly well. The parsers for infix expressions are designed mutually recursive. parseInfixOp :: Parser ...

PHP and RegEx: Split a string by commas that are not inside brackets (and also nested brackets)

Two days ago I started working on a code parser and I'm stuck. How can I split a string by commas that are not inside brackets, let me show you what I mean: I have this string to parse: one, two, three, (four, (five, six), (ten)), seven I would like to get this result: array( "one"; "two"; "three"; "(four, (five, six), (ten...

How can I build a Truth Table Generator?

I'm looking to write a Truth Table Generator as a personal project. There are several web-based online ones here and here. (Example screenshot of an existing Truth Table Generator) I have the following questions: How should I go about parsing expressions like: ((P => Q) & (Q => R)) => (P => R) Should I use a parser generator like A...

Looking for ideas on a computer science course project.

Hey. I'm taking a course titled Principles of Programming Languages, and I need to decide on a project to do this summer. Here is a short version of what the project needs to accomplish: "The nature of the project is language processing. Writing a Scheme/Lisp processor is a project of this type. A compiler for a language like C or Pasca...