parsing

Json to Map

Hi, What is the best way to convert a JSON code as this: { "data" : { "field1" : "value1", "field2" : "value2"}} in a Java Map in which one the keys are (field1, field2) and the values for those fields are (value1, value2). Any ideas? Should I use Json-lib for that? Or better if I write my own parser? Thanks in advance. ...

Parsing Ambiguous Dates (Language Independent)

I am curious what would be the best way to handle an ambiguous date string in any given language. When pre-validating your user input isn't an option, how should MM/dd/YYYY dates be parsed? How would you parse the following ambiguous date and for what reason (statistical, cultural, etc)? '1111900' as Jan 11, 1900 [M/dd/YYYY] or Nov 1,...

How can I remove middle initial with a dot at the end?

I've got a bunch of first names in a field that carry a middle initial with a '.' at the end..I need a regex to convert this example:Kenneth R.intoKennethI was trying to build my own and found this useful site btw..http://www.gskinner.com/RegExr/but I'm new to Perl & regular expressions and could only get "...$" - which is useless when t...

Using XML parser to create XHTML code on fly

Hi all. Developing server side code i finally got my eyes X-crossed trying to write - and then understand, of course - forms or other html code in which text strings (attributes) within double quotes must occur in tagged string (markup) opening and closing properly; but often javascript text within apostophes must be instantiated, quite ...

Using C# Regular expression to replace XML element content

I'm writing some code that handles logging xml data and I would like to be able to replace the content of certain elements (eg passwords) in the document. I'd rather not serialize and parse the document as my code will be handling a variety of schemas. Sample input documents: doc #1: <user> <userid>jsmith</userid> <p...

How can I best parse this comma delimited text file?

Hi, I am trying to figure out the best way to parse this comma delimited text file. Here is an excerpt: bldgA, fred, lunch bldgA, sally, supper bldgB, bob, parking lot bldgB, frank, rooftop ... What I am trying to do is read "bldgA" and then I want the person (2nd column), "fred" for example. But I don't want to parse the file look...

How can I split a url string up into separate parts in Python?

Hi all. I decided that I'll learn python tonight :) I know C pretty well (wrote an OS in it) so I'm not a noob in programming so everything in python seems pretty easy, but I don't know how to solve this problem : let's say I have this address: http://example.com/random/folder/path.html Now how can I create two strings from this, one c...

Which java YAML library should I use?

There are at least 4 YAML implementations listed at yaml.org. Which one of these (or another) would you recommend, and why? There are two ways you could answer this question, either by voting for one of the 4, or by giving a good answer that compares them or strongly justifies one of them. I'll add the 4 mentioned so people can vote, b...

Best cross platform XML parsers for Python

On a project I'm working on we're using LibXML2 (import lxml) because it has Objectify. But we're finding that getting it to work in OSX is an incredibly involved process. Does anyone have suggestions for clean cross platform xml libraries that don't have excessive dependencies on C libraries? ...

Parsing command output in .NET

I want to connect up to a database server in my .NET app and execute a database command that produces a series of database statistics. The problem is that it doesn't return the stats in a structured format, it returns it in plain text (like a df -k command in UNIX) I can capture the output and parse it, but I was wondering if there's a...

Best way to parse string of email addresses

So i am working with some email header data, and for the to:, from:, cc:, and bcc: fields the email address(es) can be expressed in a number of different ways: First Last <[email protected]> Last, First <[email protected]> [email protected] And these variations can appear in the same message, in any order, all in one comma separated string:...

Best 3rd Party Resume Parser Tool

We are working on a hiring application and need the ability to easily parse resumes. Before trying to build one, was wondering what resume parsing tools are available out there and what is the best one, in your opinion? We need to be able to parse both Word and TXT files. ...

What is the best way to handle a bad link given to BeautifulSoup?

I'm working on something that pulls in urls from delicious and then uses those urls to discover associated feeds. However, some of the bookmarks in delicious are not html links and cause BS to barf. Basically, I want to throw away a link if BS fetches it and it does not look like html. Right now, this is what I'm getting. trillian:D...

Is there a good, re-usable parser that converts a string into a hierarchy of lists?

I'd like to take a string such as this: [One, Two[A, B[i, ii, iii, iv], C], Three] And convert it into a hierarchy of lists, so that if I execute code such as the following: Console.Write(myList[1][1][2]); The output will be: iii I'm hoping that this is a common enough requirement that there's some simple parsing code written in...

How to turn a token stream into a parse tree

Hello all. I have a lexer built that streams out tokens from in input but I'm not sure how to build the next step in the process - the parse tree. Does anybody have any good resources or examples on how to accomplish this? ...

Reading HTML file to DOM tree using Java

Is there a nice parser/library which is able to read an HTML document into a DOM tree usinf Java? I'd like to use the standard DOM/Xpath API that Java provides. But all libraries I can find only seem have custom APIs to solve this task. Furthermore the conversion HTML to XML-DOM seems unsupported by the most of the available parsers. A...

BeautifulSoup 3.1 parser breaks far too easily

I was having trouble parsing some dodgy HTML with BeautifulSoup. Turns out that the HTMLParser used in newer versions is less tolerant than the SGMLParser used previously. Does BeautifulSoup have some kind of debug mode? I'm trying to figure out how to stop it borking on some nasty HTML I'm loading from a crabby website: <HTML> <...

Writing a subshell parsing rule on ANTLR

I'm trying to create a simple BaSH-like grammar on ANTLRv3 but haven't been able to parse (and check) input inside subshell commands. Further explanation: I want to parse the following input: $(command parameters*) `command parameters` "some text $(command parameters*)" And be able to check it's contents as I would with simple in...

Improving/Fixing a Regex for C style block comments

I'm writing (in C#) a simple parser to process a scripting language that looks a lot like classic C. On one script file I have, the regular expression that I'm using to recognize /* block comments */ is going into some kind of infinite loop, taking 100% CPU for ages. The Regex I'm using is this: /\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*...

How To Parse a URL in J2ME

I'm trying to extract the query's name-value pairs from a URL using J2ME, but it doesn't make it easy. J2ME doesn't have the java.net.URL class nor does String have a split method. Is there a way to extract name-value pairs from a URL using J2ME? Any open source implementations would be welcome too. ...