I'm struggling to find whether a word is noun or verb etc
I found the MIT Java Wordnet Interface
there was a sample code like this, but when i use this i get error that Dictionary is abstract class and cannot be instantiated
public void testDictionary() throws IOException {
// construct the URL to the Wordnet dictionary directory
S...
Greetings everyone,
A friend and I are discussing the possibility of a new project: A translation program that will pop up a translation whenever you hover over any word in any control, even static, non-editable ones. I know there are many browser plugins to do this sort of thing on webpages; we're thinking about how we would do it sys...
Hi,
I have a somewhat large document and want to do stop-word elimination and stemming on the words of this document with "Python".Does anyone know an of the shelf package for these?
If not a code which is fast enough for large documents is also welcome.
Thanks
...
Hi,
I downloaded wikipedia dump and want to convert from wiki format to my object format. Is there a wiki parser available that converts the object into xml.
Thank you
...
I need to do two things, first, find a given text which are the most used word and word sequences (limited to n).
Example:
Lorem *ipsum* dolor sit amet, consectetur adipiscing elit. Nunc auctor urna sed urna mattis nec interdum magna ullamcorper. Donec ut lorem eros, id rhoncus nisl. Praesent sodales lorem vitae sapien volutpat et ac...
EDIT: i removed the null parameter for wordnet object and it works perfectly..
hi
I just ran this sample code given in the source website
import rita.wordnet.RiWordnet;
public class Main {
public static void main(String[] args) {
// Would pass in a PApplet normally, but we don't need to here
RiWordnet wordnet =...
I've heard that Perl is used a lot for NLP, but I can't find almost any good NLP tools for Perl. What are some good Perl NLP tools/resources? Python has NLTK. Java has OpenNLP. Does Perl have anything similar?
This is really a general question, but if someone could also specifically address chunking and POS-tagging, that would be awesom...
Hi,
I've been reading alot of articles that explain the need for an initial set of texts that are classified as either 'positive' or 'negative' before a sentiment analysis system will really work.
My question is: Has anyone attempted just doing a rudimentary check of 'positive' adjectives vs 'negative' adjectives, taking into account a...
Hi I'm trying to use the PET Parser, but the documentation given for usage is insufficient. Can anyone point me to a good article or tutorial on using PET? Does it support utf-8?
...
Hi,
I am looking for a taxonomy of categories in a kind of tree structure for my project. For example:
Organiation -> (Finance, Business, Government)
Finance -> (Hedge fund, equities)
Person -> (Sports, Music, Technology)
Sports -> (Football, Soccer, Basketball)
Music -> (Rock, pop)
Is there a place, I can find this high level cate...
Hi
Here is my requirement. I want to tokenize and tag a paragraph in such a way that it allows me to achieve following stuffs.
Should identify date and time in the paragraph and Tag them as DATE and TIME
Should identify known phrases in the paragraph and Tag them as CUSTOM
And rest content should be tokenized should be tokenized by th...
Hi,
I a totally new to sat4j solver..
it says some cnf file should be given as input
is there any possible way to give the rule as input and get whether it is satisfiable or not?
my rule will be of the kind
Can ssomeone help me how to solve this using sat4j solver?
...
I'm implementing readability test and have implemented simple algorithm of detecting sylables.
Detecting sequences of vowels I'm counting them in words, for example word "shoud" contains one sequence of vowels which is 'ou'. Before I'm counting them i'm removing suffixes like -les, -e, -ed (for example word "like" contains one syllable b...
Hey!
I want to colorize the words in a text according to their classification (category/declination etc). I have a fully working dictionary, but the problem is that there is a lot of ambiguity. foedere, for instance, can be forms of either the verb "fornicate" or the noun "treaty".
What the general strategies for solving these ambiguit...
I would be very glad if someone can make clear for me example mentioned ono wikipedia:
http://en.wikipedia.org/wiki/Earley_algorithm
consider grammar:
P → S # the start rule
S → S + M | M
M → M * T | T
T → number
and input:
2 + 3 * 4
Earley algorithm works like this:
(state no.) Production (Origin) # Comment
----...
Grammar by definition contains productions, example of very simple grammar:
E -> E + E
E -> n
I want to implement Grammar class in c#, but I'm not sure how to store productions, for example how to make difference between terminal and non-terminal symbol.
i was thinking about:
struct Production
{
String Left; // for example E...
It seems my Google-fu is failing me.
Does anyone know of a freely available word base dictionary that just contains bases of words? So, for something like strawberries, it would have strawberry. But does NOT contain abbreviations or misspellings or alternate spellings (like UK versus US)? Anything quickly usable in Java would be good bu...
Hi all,
I'm wondering whether major SQL engines out there (MS SQL, Oracle, MySQL) have the ability to understand that 2 words are related because they share the same root.
We know it's easy to match "networking" when searching for "network" because the latter is a substring of the former.
But do SQL engines have functions that can mat...
Hello,
I'm doing an Information Retrieval Task. As part of pre-processing I want to doing.
Stopword removal
Tokenization
Stemming (Porter Stemmer)
Initially, I skipped tokenization. As a result I got terms like this:
broker
broker'
broker,
broker.
broker/deal
broker/dealer'
broker/dealer,
broker/dealer.
broker/dealer;
broker/deale...