natural-language

Algorithm to determine how positive or negative a statement/text is

I need to implement sentiment analysis. Can anyone point me to examples/reference implementations? ...

How to determine the (natural) language of a document?

I have a set of documents in two languages: English and German. There is no usable meta information about these documents, a program can look at the content only. Based on that, the program has to decide which of the two languages the document is written in. Is there any "standard" algorithm for this problem that can be implemented in a...

Natural Language date and time parser for java

Hey guys, I am working on a Natural Language parser which examines a sentence in english and extracts some information like name, date etc. for example: "Lets meet next tuesday at 5 PM at the beach." So the output will be something like : "Lets meet 15/09/2009 at 1700 hr at the beach" So basically, what i want to know is that is ther...

Books/resources for Natural Language Processing for non-academics

I found "Natural Language Processing with Python" today, and am wondering what other good, non-academic (the research papers tend to be too dry and/or specific to certain areas) NLP resources the SO community knows about. I'm starting-out in text processing for a couple hobby projects, and am keen to find good places to start :) ...

Finding type of break in icu::BreakIterator

I'm trying to understang how to use icu::BreakIterator to find specific words. For example I have following sentence: To be or not to be? That is the question... Word instance of break iterator would put breaks there: |To| |be| |or| |not| |to| |be|?| |That| |is| |the| |question|.|.|.| Now, not every pair of break points is a...

Package to compare LSA, TFIDF, Cosine metrics and Language Models

Hi, I'm looking for a package (any language, really) that I can use on a corpus of 50 documents to perform interdocument similarity testing in various metrics, like tfidf, okapi, language models, lsa, etc. I want as a result a document similarity matrix, i.e. doc1 is x% similar to doc2, etc... This is for research purposes, not for pr...

Natural language command language

I'm interested in developing a natural language command language for a domain with existing rules. I was very impressed when Terry Winograd's SHRDLU showed the way (the conversation below is 40 years old! Astonishing). Can we do better now and if so where can I get examples? Person: Pick up a big red block. Computer: OK. Person: ...

NLTK tagging in German

I am using NLTK to extract nouns from a text-string starting with the following command: tagged_text = nltk.pos_tag(nltk.Text(nltk.word_tokenize(some_string))) It works fine in English. Is there an easy way to make it work for German as well? (I have no experience with natural language programming, but I managed to use the python nl...

Algorithms to detect phrases and keywords from text

I have around 100 megabytes of text, without any markup, divided to approximately 10,000 entries. I would like to automatically generate a 'tag' list. The problem is that there are word groups (i.e. phrases) that only make sense when they are grouped together. If I just count the words, I get a large number of really common words (is, t...

Interesting linguistics/nlp problems/projects

As I know, looking for a problem to solve (debugging, thinking up a theme for an article, whatever) is the most creative, interesting and difficult part of any problem-solving work. Or just the most difficult. But I have no idea what's going on in programming-related linguistics. I love languages and simple-for-babies-but-neither-unders...

Set of books about Natural Language processing, Semantic Analysis and Data Mining.

So i´m starting to write my thesis of my master, next semester (should be done before june), i already have the theme, and i need to write the state of art till february. The main areas are Intelligent systems, Natural Language processing, Semantic Analysis and Data Mining. I am researching for the best books about Natural Language pro...

What is the default chunker for NLTK toolkit in Python?

I am using their default POS tagging and default tokenization..and it seems sufficient. I'd like their default chunker too. I am reading the NLTK toolkit book, but it does not seem like they have a default chunker? ...

Is there a fairly simple way for a script to tell (from context) whether "her" is a possessive pronoun?

I am writing a script to reverse all genders in a piece of text, so all gendered words are swapped - "man" is swapped with "woman", "she" is swapped with "he", etc. But there is an ambiguity as to whether "her" should be replaced with "him" or "his". ...

Dealing with integer-valued features for CRF in mallet

Hi, I am just starting to use the SimpleTagger class in mallet. My impression is that it expects binary features. The model that I want to implement has positive integer-valued features and I wonder how to implement this in mallet. Also, I heard that non-binary features need to be normalized if the model is to make sense. I would apprec...

Special Ocassion parser in JAVA

Hey guys, I am working on a date parser in Java. Just wanted some information on if there is any java library which could parse special occasions like for example if I give input as: Christmas or new year, it returns a date for this. Thanks in advance. Regards, Pranav ...

Database of readings for Japanese words

Does anyone know of an off-the-shelf database that provides phonetic (kana) readings for Japanese words? ...

Natural Language Toolkit Equivalent in c#

Does there exist something equivalent to Natural Language Toolkit in c# like Python ...

Canadian to US English

Does there exist something like Canadian to US english e-dictionary which I can use in my application ? ...

what is the true difference between lemmatization vs stemming?

When do I use each ? Also...is the NLTK lemmatization dependent upon Parts of Speech? Wouldn't it be more accurate if it was? ...

Which OSS can extract a synopsis from a text?

Is there an OSS which can compress a text to a synopsis? My goal is to build an editor for SciFi novels which can either automatically create a synopsizes for chapters or at least make a suggestion for one. ...