I'm looking for an existing library to summarize or paraphrase content (I'm aiming at blog posts) - any experience with existing natural language processing libraries?
I'm open to a variety of languages, so I'm more interested in the abilities & accuracy.
...
Hi, my primary language is spanish, but I use all my software in english, including windows; however I'd like to use speech recognition in spanish.
Do you know if there's a way to use vista's speech recognition in other language than the primary os language?
...
Are there any good APIs and public datasets (dictionaries, phrases) for working w/ natural languages?
Specifically, do any good ones exist for working on translation between English and Korean?
...
How does one automatically find categories for text based on content?
...
This is just a poll on what parser you like to use for parsing sentences of natural language syntactically. I am interested in complete software toolkits/solutions. A good answer would list at least some of the following:
The name of the parser (obviously) and a link to its webpage.
The (programming!) language(s) it's written in.
The (...
Without getting a degree in information retrieval, I'd like to know if there exists any algorithms for counting the frequency that words occur in a given body of text. The goal is to get a "general feel" of what people are saying over a set of textual comments. Along the lines of Wordle.
What I'd like:
ignore articles, pronouns, etc...
I have an algorithm that generates strings based on a list of input words. How do I separate only the strings that sounds like English words? ie. discard RDLO while keeping LORD.
EDIT: To clarify, they do not need to be actual words in the dictionary. They just need to sound like English. For example KEAL would be accepted.
...
I need your help in determining the best approach for analyzing industry-specific sentences (i.e. movie reviews) for "positive" vs "negative". I've seen libraries such as OpenNLP before, but it's too low-level - it just gives me the basic sentence composition; what I need is a higher-level structure:
- hopefully with wordlists
- hopefull...
Where can i find some .Net or conceptual resources to start working with Natural Language where I can pull context and subjects from text. I wish not to work with word frequency algorithms.
...
Does anyone have a suggestion for where to find archives or collections of everyday English text for use in a small corpus? I have been using Gutenberg Project books for a working prototype, and would like to incorporate more contemporary language. A recent answer here pointed indirectly to a great archive of usenet movie reviews, whic...
Please read the whole question. I'm not looking for an approach to managing multi-lingual content, but I'm looking for a way to actually get that multi-lingual content. This usually falls within technical recommendations on most projects I work on, and I hope someone can offer some help. We are working with a client now who has the perso...
A couple of days ago, I read a blog entry (http://ayende.com/Blog/archive/2008/09/08/Implementing-generic-natural-language-DSL.aspx) where the author discuss the idea of a generic natural language DSL parser using .NET.
The brilliant part of his idea, in my opinion, is that the text is parsed and matched against classes using the same n...
I'm experimenting with Latent Dirichlet Allocation for topic disambiguation and assignment, and I'm looking for advice.
Which program is the "best", where best is some combination of easiest to use, best prior estimation, fast
How do I incorporate my intuitions about topicality. Let's say I think I know that some items in the corpus a...
I mean, is there a coded language with human style coding?
For example:
Create an object called MyVar and initialize it to 10;
Take MyVar and call MyMethod() with parameters. . .
I know it's not so useful, but it can be interesting to create such a grammar.
...
I need to parse recipe ingredients into amount, measurement, item, and description as applicable to the line, such as 1 cup flour, the peel of 2 lemons and 1 cup packed brown sugar etc. What would be the best way of doing this? I am interested in using python for the project so I am assuming using the nltk is the best bet but I am open t...
Question
So I've recently came up with some new possible projects that would have to deal with deriving 'meaning' from text submitted and generated by users.
Natural language processing is the field that deals with these kinds of issues, and after some initial research I found the OpenNLP Hub and university collaborations like the atte...
I'm working on a project where I need to analyze a page of text and collections of pages of text to determine dominant words. I'd like to know if there is a library (prefer c# or java) that will handle the heavy lifting for me. If not, is there an algorithm or multiple that would achieve my goals below.
What I want to do is similar...
TF-IDF (term frequency - inverse document frequency) is a staple of information retrieval. It's not a proper model though, and it seems to break down when new terms are introduced into the corpus. How do people handle it when queries or new documents have new terms, especially if they are high frequency. Under traditional cosine match...
I think there is a wealth of natural language data associated with sites like reddit or digg or news.google.com.
I have done a little bit of research with text mining, but can't find how I could use those tools to parse something like reddit.
What kind of applications can you come up with?
...
I need to match a string like "one. two. three. four. five. six. seven. eight. nine. ten. eleven" into groups of four sentences. I need a regular expression to break the string into a group after every fourth period. Something like:
string regex = @"(.*.\s){4}";
System.Text.RegularExpressions.Regex exp = new System.Text.Regul...