lexical-analysis

How to write a Python lexical analyser?

I'm trying to write a C module to lexically analyse Python code. How can I do it? ...

lexical analyse or series of regular expressions to parse unstructured text into structured form

I am trying to write some code that will function like google calendars quick add feature . You know the One where you can input any of the following : 1) 24th sep 2010 , Johns Birthday 2) John's Birthday , 24/9/10 3) 24 September 2010 , Birthday of John Doe 4) 24-9-2010 : John Does Birthday 5) John Does Birthday 24th of September 2010 ...

How can I find only 'interesting' words from a corpus?

I am parsing sentences. I want to know the relevant content of each sentence, defined loosely as "semi-unique words" in relation to the rest of the corpus. Something similar to Amazon's "statistically improbable phrases", which seem to (often) convey the character of a book through oddball strings of words. My first pass was to start ma...

How to detect vulnerable/personal information in CVs programmatically (by means of syntax analysis/parsing etc...)

To make matter more specific: How to detect people names (seems like simple case of named entity extraction?) How to detect addresses: my best guess - find postcode (regexes); country and town names and take some text around them. As for phones, emails - they could be probably caught by various regexes + preprocessing Don't care about...

How to combine Regexp and keywords in Scala parser combinators

I've seen two approaches to building parsers in Scala. The first is to extends from RegexParsers and define your won lexical patterns. The issue I see with this is that I don't really understand how it deals with keyword ambiguities. For example, if my keyword match the same pattern as idents, then it processes the keywords as idents....

Matching lexeme variants with Antlr3

I'm trying to match measurements in English input text, using Antlr 3.2 and Java1.6. I've got lexical rules like the following: fragment MILLIMETRE : 'millimetre' | 'millimetres' | 'millimeter' | 'millimeters' | 'mm' ; MEASUREMENT : MILLIMETRE | CENTIMETRE | ... ; I'd like to be able to accept any combinat...

Ruby/Python - generating and parsing C/C++ code

Hi, I need to generate C structs and arrays from data stored in a db table, and alternately parse similar info. I use both ruby and python for this task, and was wondering if anyone heard of a module/lib that handles this for either/both languages? I could do this on my own with some string processing, but wanted to check if there's a k...