natural-language

Natural Language Processing in Ruby

I'm looking to do some sentence analysis (mostly for twitter apps) and infer some general characteristics. Are there any good natural language processing libraries for this sort of thing in Ruby? Similar to http://stackoverflow.com/questions/870460/java-is-there-a-good-natural-language-processing-library but for Ruby. I'd prefer somethi...

Very basic English grammar parser

I'm writing a very basic parser(mostly just to better understand how they work) that takes a user's input of a select few words, detects whether the sentence structure is OK or Not OK, and outputs the result. The grammar is: Sentence: Noun Verb Article Sentence Sentence Conjunction Sentence Conjunction: "and" "or" "but" Noun: "birds...

Looking for any free tagged english corpus(es)

Does anyone know of any free (licensed free for commercial use) tagged English corpus(es) that can be used to train a part of speech (POS) tagger? The only ones I have seen online seem to start in the thousands for commercial use. Any help would be appreciated, thanks. ...

How to implement a SIMPLE "You typed ACB, did you mean ABC?"

I know this is not a straight up question, so if you need me to provide more information about the scope of it, let me know. There are a bunch of questions that address almost the same issue (they are linked here), but never the exact same one with the same kind of scope and objective - at least as far as I know. Context: I have a M...

How can I create relative/approximate dates in Perl?

I'd like to know if there are any libraries (preferably DateTime-esque) that can take a normal date time and create an appropriate relative human readable date. Essentially the exact opposite of the more common question: How can I parse relative dates with Perl?. Obviously, the exact wording/interpretation is up to the actual implementa...

Parsing Meaning from Text

I realize this is a broad topic, but I'm looking for a good primer on parsing meaning from text, ideally in Python. As an example of what I'm looking to do, if a user makes a blog post like: "Manny Ramirez makes his return for the Dodgers today against the Houston Astros", what's a light-weight/ easy way of getting the nouns out of a s...

Localizing and Globalization of WinForms applications

Hi, We've developed a WinForms application (targeting .NET 2.0 with VS2008), we've just found out that we need to localize it for use in another language (other than english) :( What are the guidelines for developing multi-lingual languages in .NET? Another application borrows Paint.NET's idea of globalization (using resources) but I w...

Scope ambiguity in natural language

I feel it is bit curious to understand the Natural language processing. I have the following questions.. What is meant by Scope ambiguity in natural language? How can done Statistical resolution of scope ambiguity? Which is the best language can I use for the Statistical resolution? ...

how to check if a string looks randomized, or human generated and pronouncable?

For the purpose of identifying [possible] bot-generated usernames. Suppose you have a username like "bilbomoothof" .. it may be nonsense, but it still contains pronouncable sounds and so appears human-generated. I accept that it could have been randomly generated from a dictionary of syllables, or word parts, but let's assume for a mom...

Automatically determine the natural language of a website page given its URL

I'm looking for a way to automatically determine the natural language used by a website page, given its URL. In Python, a function like: def LanguageUsed (url): #stuff Which returns a language specifier (e.g. 'en' for English, 'jp' for Japanese, etc...) Summary of Results: I have a reasonable solution working in Python using cod...

CLI grammar checker for determining tense

I like to use the present tense in my Git logs (for example, "Add feature" instead of "Added feature"). Currently, I have an extremely naive Git hook that aborts the commit if the first word of the log message ends in 'ed', but I'd like a more robust solution (where 'more robust' means 'not totally lame'). Is there a grammar checker th...

Natural language dates in ruby/rails?

I need to show natural dates like "few seconds ago" "21 minutes ago" Is there something built in to the rails? Or may be third party? This is not hard to implement, but I do not want to invent the wheel. ...

Natural language statistics query to SQL query converter

We would like to include a facility in an ASP.NET web application that will allow a user to type in a natural language (or reasonably close to natural) question about a SQL data set (SQL Server) and get useful information in return. The sort of results required is to include min, max, std deviation, top 10, total for a column, and anythi...

identify tense in php

Hi, I'm looking for a way to analyze a string of text and find out in which tense it was written, for example : "I'm going to the store" == current, "I bought a car" == past ect.. Any tips on how I could this done? ...

Processing English Statements

Any recommendations for languages/libraries to convert sentence like: "X bumped Y, who in turn kicked Z." to X: Bumped Y: Was bumped, kicked Z ...

How would you interpret these dates?

I need to interpret relative date string like: last Friday this Tuesday next Wednesday The "Last Friday" form is easy (take the most recent Friday that is not today) but what about "this" vs. "next"? Could "this Wednesday" be yesterday on a Thursday? Could "this" and "next" Friday be the same day in some cases and a week apart in oth...

How to strip headers/footers from Project Gutenberg texts?

I've tried various methods to strip the license from Project Gutenberg texts, for use as a corpus for a language learning project, but I can't seem to come up with an unsupervised, reliable approach. The best heuristic I've come up with so far is stripping the first twenty eight lines and the last 398, which worked for a large number of...

Using the Python NLTK (2.0b5) on the Google App Engine

I have been trying to make the NLTK (Natural Language Toolkit) work on the Google App Engine. The steps I followed are: Download the installer and run it (a .dmg file, as I am using a Mac). copy the nltk folder out of the python site-packages directory and place it as a sub-folder in my project folder. Create a python module in the fo...

How can I correctly prefix a word with "a" and "an"?

I have a .NET application where, given a noun, I want it to correctly prefix that word with "a" or "an". How would I do that? Before you think the answer is to simply check if the first letter is a vowel, consider phrases like: an honest mistake a used car ...

How to determine subject, object and other words?

I'm trying to implement application that can determine meaning of sentence, by dividing it to smaller pieces. So I need to know what words are subject, object etc. so that my program can know how to handle this sentence. ...