parsing

Excel parsing and converting text

I need to be able to convert cells from one format to another according to the following rules: Property Description --enter as-- Folio Identifier ---------------------------------------------------------- Lot 23 DP789678 23/789678 Lot 7 Section 12 DP6789 7/12/6789 Lot 1 SP 45676 ...

TryParsing Enums

I am parsing some enum values from a text file. In order to simplify things I am using functions like the following: (The sample code here uses C++/CLI but answers in C# are also welcome.) bool TryParseFontStyle(String ^string, FontStyle% style){ try { FontStyle ^refStyle = dynamic_cast<FontStyle^>( Enum::Parse(FontStyle...

Mailing Address tokenization/elementization to individual components (street, city, etc.)

I need to parse international addresses to its individual components (street, city, etc.). After some reasonable survey, I found that HMMs/CRFs are the way to go. Has anybody had any success using open source implementation of HMMs or CRFs for the address tokenization problem. If yes, what are they? Also, do any implementations provide...

How can I convert a URL query string into a list of tuples using Python?

I am struggling to convert a url to a nested tuple. # Convert this string str = 'http://somesite.com/?foo=bar&amp;key=val' # to a tuple like this: [(u'foo', u'bar'), (u'key', u'val')] I assume I need to be doing something like: url = 'http://somesite.com/?foo=bar&amp;key=val' url = url.split('?') get = () for param in url[1].spl...

How do I parse YAML with nil values?

I apologize for the very specific issue I'm posting here but I hope it will help others that may also run across this issue. I have a string that is being formatted to the following: [[,action1,,],[action2],[]] I would like to translate this to valid YAML so that it can be parsed which would look like this: [['','acton1','',''],['ac...

split a huge XML interms of GB's, retaining header and footer - the same structure

Hi All, My program will be receiving an XML of size upto 8GB to 10GB with the following structure: <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE gsafeed PUBLIC "-//Google//DTD GSA Feeds//EN" ""> <gsafeed> <header> <datasource>Name</datasource> <feedtype>incremental</feedtype> </header> <group> <record url="" action="add" mimetype="t...

How to parse dict output in a user friendly way in PHP?

Hi, I am trying to implement a dictionary-type service. I send a request with php using cURL to dict.org with the dict protocol. This is my code (which on its own works and may be helpful for future readers): $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "dict://dict.org/define:(hello):english:exact"); curl_setopt($ch, CURLOPT_RETURN...

C# Library to parse human readable time spans

Is there a library that exists that will parse human readable timespans into a .net TimeSpan? I need something that will parse strings like 30 days 1 week 5 hours Does such a thing exist? Its probably not too hard to write myself, but if something is out there, it would be so much easier! I currently don't need functionality like "3...

Writing an z80 assembler - Lexing ASM and building a parse tree using composition?

Hi guys, I'm very new to the concept of writing an assembler and even after reading a great deal of material, I'm still having difficulties wrapping my head around a couple of concepts. 1) What is the process to actually break up a source file into tokens? I believe this process is called lexing and I've searched high and low for a real...

javascript substring help.

I have a string "2500 - SomeValue". How can I remove everything before the 'S' in 'SomeValue'? var x = "2500 - SomeValue"; var y = x.substring(x.lastIndexOf(" - "), // this is where I'm stuck, I need the rest of the string starting from here. Thanks for any help. ~ck ...

c#: tryparse vs convert

Today I read an article where it's written that we should always use TryParse(string, out MMM ) for conversion rather than Convert.ToMMM(). I agree with article but after that I got stuck in one scenario? When there will always be some valid value for the string and hence we can also use Convert.ToMMM() because we don't get any excepti...

Splitting strings in javascript

hi guys, I have variable named s in javascript.It contains value as '40&lngDesignID=1'.I want to split it with & and want to get 40 and lngDesignID=1.How can i do that in javascript?Can anybody help? ...

Data in XML files: One large file or multiple small ones?

I am currently working on a XML-based CMS that saves data in chunks called "items". These can be used on the website to display content. Now, at the moment I have one separate XML file for every item. Since most pages on that website use about three to four of these items, a rather small website with e.g. 20 pages has about 100 differe...

How to make a logical boolean parser for text input ?

Hello friends, I need to make a parser to be able to extract logical structure from a text input in order to construct a query for some web service. I tried to use regular expressions but it gets really complicated to handle imbrication logic, so I decided to ask for help, maybe I am doing it the wrong way. ex: ( (foo1 and bar) or (f...

What is the set of valid first characters in an XML document?

I'm working on some code to determine the character encoding of an XML document being returned by a web server (an RSS feed in this particular case). Unfortunately, sometimes the web server lies and tells me that the document is UTF-8 when in fact it's not, or the boilerplate XML generation code on the server has <?xml encoding='UTF-8'?...

Work Around For PrimativeType.TryParse

Hello All, I have become accustomed to using TryParse for attempting to parse unknown types: Dim b As Boolean Dim qVal As Boolean = If(Boolean.TryParse(Request.QueryString("q").Trim(), b), b, False) or bool b; bool qVal = (Boolean.TryParse(Request.QueryString("q").Trim(), out b) ? b : false; So, just curious if someone knows a bet...

what is the best programming language to write parsers and compilers ?

please i need some resources to begin (i am a cs student) ...

Good Language for Spider and Indexer

I love Ruby and its framework, but I don't think that Ruby On Rails is the best choise to develop a Feed-parser and Indexer. Maybe Python or Java are better choises. What language do you suggest? ...

HTML DOM Validator - PHP or JavaScript

Hello, I am posting a form with a textarea in it. I allow HTML to be posted. Now I wish to check if the user has closed the tags he has put in the html he posted... when I am displaying that HTML, the broken tags like divs and tables etc spoil the whole page display... any way to check for proper tag useage in php or javascript ? Any...

Parsing Custom Tags with Ruby

I am trying to do pretty much the same thing as here, but in ruby: http://stackoverflow.com/questions/1201778/parsing-custom-tags-with-php ...