natural-language

Practical examples of NLTK use

I'm playing about with the Natural Language Toolkit (NLTK). The documentation (Book and HOWTO) and is a little heavy going. Are there any good but basic examples of the use of NLTK? I'm thinking of things like the NTLK articles on the Stream Hacker blog. ...

Sentiment analysis for twitter in python

I'm looking for an open source implementation, preferably in python, of Textual Sentiment Analysis (http://en.wikipedia.org/wiki/Sentiment_analysis). Is anyone familiar with such open source implementation I can use? I'm writing an application that searches twitter for some search term, say "youtube", and counts "happy" tweets vs. "sad"...

Natural language date parser for ruby/rails

Does anybody know of something similar to Date.js in Ruby? Something that would be able to return a date object from something like: "two weeks from today". The Remember the Milk webapp incorporates this feature into their system and it is incredibly easy to use. I would use the Date.js library itself but because it is on the client sid...

Finding related words (specifically physical objects) to a specific word

I am trying to find words (specifically physical objects) related to a single word. For example: Tennis: tennis racket, tennis ball, tennis shoe Snooker: snooker cue, snooker ball, chalk Chess: chessboard, chess piece Bookcase: book I have tried to use WordNet, specifically the meronym semantic relationship; however, this method is...

Natural language parsing, practical example

I am looking to use a natural language parsing library for a simple chat bot. I can get the Parts of Speech tags, but I always wonder. What do you do with the POS. If I know the parts of the speech, what then? I guess it would help with the responses. But what data structures and architecture could I use. ...

Why some countries have dot as a decimal separator and some have comma?

Why in some countries there is a comma separator and in some dot? Do you know what is the reason of that? It's very annoying to check every time if you should use this or this. ...

Parsing expressions with an undefined number of arguments

I'm trying to parse a string in a self-made language into a sort of tree, e.g.: # a * b1 b2 -> c * d1 d2 -> e # f1 f2 * g should result in: # a * b1 b2 -> c * d1 d2 -> e # f1 f2 * g #, * and -> are symbols. a, b1, etc. are texts. Since the moment I know only rpn method to evaluate expressions, and my current solution...

Non regular context-free language and infinite regular sublanguages

I had a work for the university which basically said: "Demonstrates that the non-regular language L={0^n 1^n : n natural} had no infinite regular sublanguages." I demonstrated this by contradiction. I basically said that there is a language S which is a sublanguage of L and it is a regular language. Since the possible Regular expre...

Compare many text files that contain duplicate "stubs" from the previous and next file and remove duplicate text automatically

I have a large number of text files (1000+) each containing an article from an academic journal. Unfortunately each article's file also contains a "stub" from the end of the previous article (at the beginning) and from the beginning of the next article (at the end). I need to remove these stubs in preparation for running a frequency an...

[PHP] How to combine words of a sentence to composed terms?

Hello! I have a sentence, for example John Doe moved to New York last year. Now I split the sentence into the single words and I get: array('John', 'Doe', 'moved', 'to', 'New', 'York', 'last', 'year') That's quite easy. But then I want to combine the single words to get all the composed terms. It doesn't if the composed term...

Shorten a text and only keep important sentences

Hello! The German website nandoo.net offers the possibility to shorten a news article. If you change the percentage value with a slider, the text changes and some sentences are left out. You can see that in action here: http://www.nandoo.net/read/article/299925/ The news article is on the left side and tags are marked. The slider...

How to analyze simple English sentences

Is there any library that can be used for analyzing (nlp) simple english text. For example it would be perfect if it can do that; Input: "I am going" Output: I, go, present continuous tense ...

Rhyme in PHP

I am having a hard time to find a way to detect if two words has the same rhyme in English. It has not to be the same syllabic ending but something closer to phonetically similarity. I can not believe in 2009 the only way of doing it is using those old fashioned rhyme dictionaries. Do you know any resources (in PHP would be a plus) to ...

PHP text parsing and / or make your own language?

Been Googling around without finding much at all, so does anyone know of a class or library that helps you parse any sort of language, like a Domain Specific Language (I'm creating one, so I'm flexible in what the syntax and format can be) into either PHP code or some helpful struct or a class hiearchy or ... ? Anything goes at this poin...

How to turn plural words singular?

I'm preparing some table names for an ORM, and I want to turn plural table names into single entity names. My only problem is finding an algorithm that does it reliably. Here's what I'm doing right now: If a word ends with -ies, I replace the ending with -y If a word ends with -es, I remove this ending. This doesn't always work however...

Appropriate article (a/an) in String.Format

I'm looking for a culturally-sensitive way to properly insert a noun into a sentence while using the appropriate article (a/an). It could use String.Format, or possibly something else if the appropriate way to do this exists elsewhere. For example: Base Sentence: "You are looking at a/an {0}" This should format to: "You are looking at...

Natural Programming Language.... what would you like to see?

I am looking at writing a compiler and after I complete something in a "C" style I am looking at adapting it to other models. What are some syntactical constructs you would expect to see in a "natural" programming language? The target platform for this compiler will be the CLR and I am currently using Oslo+MGrammar for the lexer/pars...

Is it possible to guess a user's mood based on the structure of text?

I assume a natural language processor would need to be used to parse the text itself, but what suggestions do you have for an algorithm to detect a user's mood based on text that they have written? I doubt it would be very accurate, but I'm still interested nonetheless. EDIT: I am by no means an expert on linguistics or natural language...

English translation of the STTS tagset

The most common part-of-speech tagset for German is the STTS tagset. I need an English translation of the explanations for each tag. Not being a linguist I don't feel comfortable (let alone qualified) for translating this myself. Google turned up nothing, so any help is appreciated. ...

Finding bigram in a location index

I have a table which indexes the locations of words in a bunch of documents. I want to identify the most common bigrams in the set. How would you do this in MSSQL 2008? the table has the following structure: LocationID -> DocID -> WordID -> Location I have thought about trying to do some kind of complicated join... and i'm just doing...