I have:
Rutsch is for rutterman ramping his roe
which is a phrase from Finnegans Wake. The epic riddle book is full of leitmotives like this, such as 'take off that white hat,' and 'tip,' all which get mutated into similar sounding words depending on where you are in the book itself. All I want is a way to find obvious occurrences of t...
I have a lost of sentences generated from http://www.ywing.net/graphicspaper.php, a random computer graphics paper title generator, some of example sentences sorted are as following:
Abstract Ambient Occlusion using Texture Mapping
Abstract Ambient Texture Mapping
Abstract Anisotropic Soft Shadows
Abstract Approximation
Abstract Appr...
What would be the best definition of an English word?
What are the other cases of an English word than just \w+?
Some may include \w+-\w+ or \w+'\w+; some may exclude cases like \b[0-9]+\b. But I haven't seen
any general consensus on those cases.
Do we have a formal defintion of such?
Can any of you clarify?
(Edit: broaden the questi...
I've been playing around with natural language parse trees and manipulating them in various ways. I've been using Stanford's Tregex and Tsurgeon tools but the code is a mess and doesn't fit in well with my mostly Python environment (those tools are Java and aren't ideal for tweaking). I'd like to have a toolset that would allow for easy ...
What would be the best regular expression for tokenizing an English text?
By an English token, I mean an atom consisting of maximum number of characters that can be meaningfully used for NLP purposes. An analogy is a "token" in any programming language (e.g. in C, '{', '[', 'hello', '&', etc. can be tokens). There is one restriction: Th...
Hi,
I want to use wikipedia dump for my project. The below information is required for my project.
For an wikipedia entry, I want to know which other language contain the page?
I want an downloadable data in csv or other common format.
Is there a way to get this data?
Thanks
Bala
...
Hi,
Is there a partition of english words into a high level categories like say sports, basketball etc... Its required for my project.
Is this data available somewhere? I am okay with overlapping of words across categories.
Thank you
Bala
...
Hi,
I am just starting to learn about the use of CRF++ toolkit.
I downloaded the linux version of CRF++ 0.54 ,
When i try to compile the example.cpp under sdk/ with the command
g++ -o example example.cpp
there comes the problem:
hpl@hpl-desktop:~/Documents/CRF/CRF++-0.54$ g++ -o a example.cpp
/tmp/ccmJQgGu.o: In function main':
exampl...
Hi,
I want to get a list of all the wikipedia categories. I can find them here : http://en.wikipedia.org/wiki/Special:Categories Is there a way to download all of them in xml/csv format.
Thank you
Bala
...
I'm using Stanford Parser to parse the dependence relations between pair of words, but I also need the tagging of words. However, in the ParseDemo.java, the program only output the Tagging Tree. I need each word's tagging like this:
My/PRP$ dog/NN also/RB likes/VBZ eating/VBG bananas/NNS ./.
not like this:
(ROOT
(S
(NP (PRP$ My...
Hi I want to use MALLET's topic modeling but can i provide my own tokenizer or tokenized version of the text documents when i import the data into mallet? I find MALLET's tokenizer inadequate for my usage...
...
Hi
Using Nltk and Wordnet how do i convert simple tense verb into its present, past or past participle form?
For example:
I want to write a function which would give me verb in expected form as follows.
v = 'go'
present = present_tense(v)
print present # prints "going"
past = past_tense(v)
print past # prints "went"
Any suggestion...
Possible Duplicate:
How do you implement a Did you mean?
I am writing an application where I require functionality similar to Google's "did you mean?" feature used by their search engine:
Is there source code available for such a thing or where can I find articles that would help me to build my own?
...
I've got this function, which I modified from material in chapter 1 of the online NLTK book. It's been very useful to me but, despite reading the chapter on Unicode, I feel just as lost as before.
def openbookreturnvocab(book):
fileopen = open(book)
rawness = fileopen.read()
tokens = nltk.wordpunct_tokenize(rawness)
nltk...
We've been working with the NLTK library in a recent project where we're
mainly interested in the named entities part.
In general we're getting good results using the NEChunkParser class.
However, we're trying to find a way to provide our own terms to the
parser, without success.
For example, we have a test document where my name ...
Hi!
I'm working on a syntactic parser for some language. But this language requires suffix agreement highly. For example in English a verb must agree with pronoun as I,we,you-do or he,she,it,this-does etc. In this language a verb has different forms for each pronoun. I know in literature this is handled by unification method. But I coul...
Hi,
Does anyone know of any good NLP frameorks for ruby?
I am considering using the Java open-nlp librabrary http://opennlp.sourceforge.net/ via JRuby.
I am reluctant to go down the JRuby route for a few reasons and mainly because I have no Java background.
Are there any ruby frameworks or should I go down the JRuby route with open-n...
Hi,
One simple question (but I haven't quite found an obvious answer in the NLP stuff I've been reading, which I'm very new to):
I want to classify emails with a probability along certain dimensions of mood. Is there an NLP package out there specifically dealing with this? Is there an obvious starting point in the literature I start re...
I have thousands of sentences in a file. I want to find only right/useful English Language words. Is it possible with Natural Language Processing?
Sample Sentence:
~@^.^@~ tic but sometimes world good famous tac Zorooooooooooo
I just want to extract only English Words like
tic world good famous
Any Advice how can I achieve this. Th...
I want to split a sentence into a list of words.
For English and European languages this is easy, just use split()
>>> "This is a sentence.".split()
['This', 'is', 'a', 'sentence.']
But I also need to deal with sentences in languages such as Chinese that don't use whitespace as word separator.
>>> u"这是一个句子".split()
[u'\u8fd9\u662f\u...