nlp

Natural language date/time parser for .NET?

Does anyone know of a .NET date/time parser similar to Chronic for Ruby (handles stuff like "tomorrow" or "3pm next thursday")? Note: I do write Ruby (which is how I know about Chronic) but this project must use .NET. ...

What's a good natural language library to use for paraphrasing?

I'm looking for an existing library to summarize or paraphrase content (I'm aiming at blog posts) - any experience with existing natural language processing libraries? I'm open to a variety of languages, so I'm more interested in the abilities & accuracy. ...

Vista speech recognition in multiple languages

Hi, my primary language is spanish, but I use all my software in english, including windows; however I'd like to use speech recognition in spanish. Do you know if there's a way to use vista's speech recognition in other language than the primary os language? ...

Is there an algorithm that tells the semantic similarity of two phrases

input: phrase 1, phrase 2 output: semantic similarity value (between 0 and 1), or the probability these two phrases are talking about the same thing ...

How Do You Categorize Based On Text Content?

How does one automatically find categories for text based on content? ...

How to read values from numbers written as words?

As we all know numbers can be written either in numerics, or called by their names. While there are a lot of examples to be found that convert 123 into one hundred twenty three, I could not find good examples of how to convert it the other way around. Some of the caveats: cardinal/nominal or ordinal: "one" and "first" common spelling ...

Your favorite natural language parser?

This is just a poll on what parser you like to use for parsing sentences of natural language syntactically. I am interested in complete software toolkits/solutions. A good answer would list at least some of the following: The name of the parser (obviously) and a link to its webpage. The (programming!) language(s) it's written in. The (...

Word frequency algorithm for natural language processing

Without getting a degree in information retrieval, I'd like to know if there exists any algorithms for counting the frequency that words occur in a given body of text. The goal is to get a "general feel" of what people are saying over a set of textual comments. Along the lines of Wordle. What I'd like: ignore articles, pronouns, etc...

How do I determine if a random string sounds like English?

I have an algorithm that generates strings based on a list of input words. How do I separate only the strings that sounds like English words? ie. discard RDLO while keeping LORD. EDIT: To clarify, they do not need to be actual words in the dictionary. They just need to sound like English. For example KEAL would be accepted. ...

NLP: Qualitatively "positive" vs "negative" sentence

I need your help in determining the best approach for analyzing industry-specific sentences (i.e. movie reviews) for "positive" vs "negative". I've seen libraries such as OpenNLP before, but it's too low-level - it just gives me the basic sentence composition; what I need is a higher-level structure: - hopefully with wordlists - hopefull...

Contextual Natural Language Resources, Where Do I Start?

Where can i find some .Net or conceptual resources to start working with Natural Language where I can pull context and subjects from text. I wish not to work with word frequency algorithms. ...

A StringToken Parser which gives Google Search style "Did you mean:" Suggestions

Seeking a method to: Take whitespace separated tokens in a String; return a suggested Word ie: Google Search can take "fonetic wrd nterpreterr", and atop of the result page it shows "Did you mean: phonetic word interpreter" A solution in any of the C* languages or Java would be preferred. Are there any existing Open Libraries which...

NLP: Building (small) corpora, or "Where to get lots of not-too-specialized English-language text files?"

Does anyone have a suggestion for where to find archives or collections of everyday English text for use in a small corpus? I have been using Gutenberg Project books for a working prototype, and would like to incorporate more contemporary language. A recent answer here pointed indirectly to a great archive of usenet movie reviews, whic...

What options do you recommend for language translation on content driven Web sites?

Please read the whole question. I'm not looking for an approach to managing multi-lingual content, but I'm looking for a way to actually get that multi-lingual content. This usually falls within technical recommendations on most projects I work on, and I hope someone can offer some help. We are working with a client now who has the perso...

What would the best tool to create a natural DSL in Java?

A couple of days ago, I read a blog entry (http://ayende.com/Blog/archive/2008/09/08/Implementing-generic-natural-language-DSL.aspx) where the author discuss the idea of a generic natural language DSL parser using .NET. The brilliant part of his idea, in my opinion, is that the text is parsed and matched against classes using the same n...

Methods for Geotagging or Geolabelling Text Content

What are some good algorithms for automatically labeling text with the city / region or origin? That is, if a blog is about New York, how can I tell programatically. Are there packages / papers that claim to do this with any degree of certainty? I have looked at some tfidf based approaches, proper noun intersections, but so far, no...

Theory: "Lexical Encoding"

I am using the term "Lexical Encoding" for my lack of a better one. A Word is arguably the fundamental unit of communication as opposed to a Letter. Unicode tries to assign a numeric value to each Letter of all known Alphabets. What is a Letter to one language, is a Glyph to another. Unicode 5.1 assigns more than 100,000 values to th...

Named Entity Recognition Libraries for Java

I am looking for a simple but "good enough" Named Entity Recognition library (and dictionary) for java, I am looking to process emails and documents and extract some "basic information" like: Names, places, Address and Dates I've been looking around, and most seems to be on the heavy side and full NLP kind of projects. Any recommendat...

Stemming algorithm that produces real words

I need to take a paragraph of text and extract from it a list of "tags". Most of this is quite straight forward. However I need some help now stemming the resulting word list to avoid duplicates. Example: Community / Communities I've used an implementation of Porter Stemmer algorithm (I'm writing in PHP by the way): http://tartarus.or...

Latent Dirichlet Allocation, pitfalls, tips and programs

I'm experimenting with Latent Dirichlet Allocation for topic disambiguation and assignment, and I'm looking for advice. Which program is the "best", where best is some combination of easiest to use, best prior estimation, fast How do I incorporate my intuitions about topicality. Let's say I think I know that some items in the corpus a...