parsing

What's the best way to write a parser by hand?

We've used ANTLR to create a parser for a SQL-like grammar, and while the results are satisfactory in most cases, there are a few edge cases that we need to fix; and since we didn't write the parser ourselves we don't really understand it well enough to be able to make sensible changes. So, we'd like to write our own parser. What's the...

How can you parse simple C++ typedef instructions ?

Hi, I'd like to parse simple C++ typedef instructions such as typedef Class NewNameForClass; typedef Class::InsideTypedef NewNameForTypedef; typedef TemplateClass<Arg1,Arg2> AliasForObject; I have written the corresponding grammar that i'd like to see used in parsing. Name <- ('_'|letter)('_'|letter|digit)* Type <- Name Type <- Type...

Representing optional syntax and repetition with OcamlYacc / FsYacc

I'm trying to build up some skills in lexing/parsing grammars. I'm looking back on a simple parser I wrote for SQL, and I'm not altogether happy with it -- it seems like there should have been an easier way to write the parser. SQL tripped me up because it has a lot of optional tokens and repitition. For example: SELECT * FROM t1 INNER...

What are good "real" programming examples for a beginning programmer?

I've been browsing Bjarne Stroustrup's new introductory programming book, Programming: Principles and Practice Using C++. It's meant for first-year university computer science and engineering students. Early on in the book he works through an interesting extended example of creating a desktop calculator where he ends up implementing an ...

Coding a Gmail style "hide quoted text" for web based mailing list archive

Hi all, I'm working on a web application that parses and displays email messages in a threaded format (among other things). Emails may come from any number of different mail clients, and in either text or HTML format. Given that most people have a tendency to top post, I'd like to be able to hide the duplicated message in an email rep...

TouchXML unable to parse YQL result XML on a iPhone

Problem 1: Has anyone worked with TouchXML, I am facing problem parcing rssfeed that has characters like & or even & The parser takes the url as input and doesn’t seem to parse the XML content. NSXMLParser has no such problem for the same feed URL. Problem 2: Another problem with NSXMLParse is when the foundCharacter() method finds “\n” ...

C++ string parsing (python style)

I love how in python I can do something like: points = [] for line in open("data.txt"): a,b,c = map(float, line.split(',')) points += [(a,b,c)] Basically it's reading a list of lines where each one represents a point in 3D space, the point is represented as three numbers separated by commas How can this be done in C++ without...

XML Parsing with C#?

I'm working on a project for school that involves a heavy amount of XML Parsing. I'm coding in C#, but I have yet to find a "suitable" method of parsing this XML out. There's several different ways I've looked at, but haven't gotten it right yet; so I have come to you. Ideally, I'm looking for something kind of similar to Beautiful Soup ...

ASP.Net Mapping Values Lookup.

Currently in my ASP.Net applications web.config I have an application setting that stores a comma delimited list of mapping values, like the one below. In the code behind I need to perform a lookup on this data based on input values 1, 2, 3 etc. I can either string split it and loop until I find a match, or use Regex to pull the value fr...

Parsing huge data with c++

In my job, i need to parse different kind of data files from different data sources.Sometimes i parse them by writing directly c++ code (with the help of qt and boost:D), sometimes manually with a helper program. I must note that data types are so different from each other it is so hard to create common a interface for all of them. But i...

Read data with varying formats in C++

I'm creating my first real binary parser (a tiff reader) and have a question regarding how to allocate memory. I want to create a struct within my TiffSpec class for the IFD entries. These entries will always be 12 bytes, but depending upon the type specified in that particular entry, the values at the end could be of different types (or...

How do I parenthesize an expression programmatically?

I have an idea for a simple program to make that will help me with operator precedence in languages like C. The most difficult part of this is parenthesizing the expression. For example, I want this: *a.x++ = *b.x++ Converted to this: ((*(((a).(x))++)) = (*(((b).(x))++))) Which I did manually in these steps: *a.x++ = *b...

Which Wiki Parser?

Does anyone know of a parser that can take Wiki formatted text as input and produce a tree of entities, in the same way that an XML parser produces an entity tree? To clarify, I'm looking for something that would take text like: -Intro- Textual stuff in ''italics'' --Subhead-- Yet more text and produce a tree rooted at Intro with ...

What is the best way to filter URLs for input?

I have a form that is accepting URLs from users in PHP. What characters should I allow or disallow? Currently I use $input= preg_replace("/[^a-zA-Z0-9-\?\:#.()\,\/\&\'\\"]/", "", $string); $input=substr($input,0,255); So, it's trimmed to 255 chars and only can include letters, numbers, and ? - _ : # ( ) , & ' " / Anything I should b...

Parsing "From" addresses from email text

I'm trying to extract email addresses from plain text transcripts of emails. I've cobbled together a bit of code to find the addresses themselves, but I don't know how to make it discriminate between them; right now it just spits out all email addresses in the file. I'd like to make it so it only spits out addresses that are preceeded b...

How can i retrieve a connectionString from a web.config file?

I am writing an client application in C# which will be supposed to change ConnectionString settings in a web.config file from another application I wrote. How can I achieve this goal? Is there a way to load the web.config file in my application and read/change its data object orientated? Or do I need to parse it as if beeing a complete ...

What's the best way to make a time from "Today" or "Yesterday" and a time in Python?

Python has pretty good date parsing but is the only way to recognize a datetime such as "Today 3:20 PM" or "Yesterday 11:06 AM" by creating a new date today and doing subtractions? ...

simple xml parsing

what is the simplest way to parse the lat and long out of the following xml fragment. There is no namespace etc. It is in a string variable. not a stream. <poi> <city>stockholm</city> <country>sweden</country> <gpoint> <lat>51.1</lat> <lng>67.98</lng> </gpoint> </poi> everything I have read so...

QTP: Get list of all links in an E-mail

I'm "developing" a test plan in Mercury/HP QuickTest Pro 9.1, in which I must extract a list of all links in an E-mail and perform logic against each of them. In this case, I am using Webmail, so the message will appear as a web page; though I hope to use Outlook later to replicate a more realistic UX. I am a Developer, not a Tester. C...

Is inheritance possible in JFlex?

I'm fairly new to JFlex and JSyntaxPane although I have managed to hack together a lexer for XPath. The problem I find myself in is that I'm working on a project that supports a subset of XPath with a few proprietary features. Nasty I know. If this were a regular Java problem I'd turn to inheritance but it doesn't seem possible to ach...