nltk

How can I make this Python2.6 function work with Unicode?

I've got this function, which I modified from material in chapter 1 of the online NLTK book. It's been very useful to me but, despite reading the chapter on Unicode, I feel just as lost as before. def openbookreturnvocab(book): fileopen = open(book) rawness = fileopen.read() tokens = nltk.wordpunct_tokenize(rawness) nltk...

Improving entity naming with custom file/code in NLTK

We've been working with the NLTK library in a recent project where we're mainly interested in the named entities part. In general we're getting good results using the NEChunkParser class. However, we're trying to find a way to provide our own terms to the parser, without success. For example, we have a test document where my name ...

How to check if a word is an English word with Python?

I want to check in a Python program if a word is in the English dictionary. I believe nltk wordnet interface might be the way to go but I have no clue how to use it for such a simple task. def is_english_word(word): pass # how to I implement is_english_word? is_english_word(token.lower()) In the future, I might want to check if ...

How do I count words in an nltk plaintextcorpus faster?

I have a set of documents, and I want to return a list of tuples where each tuple has the date of a given document and the number of times a given search term appears in that document. My code (below) works, but is slow, and I'm a n00b. Are there obvious ways to make this faster? Any help would be much appreciated, mostly so that I ca...

nltk custom tokenizer and tagger

Hi Here is my requirement. I want to tokenize and tag a paragraph in such a way that it allows me to achieve following stuffs. Should identify date and time in the paragraph and Tag them as DATE and TIME Should identify known phrases in the paragraph and Tag them as CUSTOM And rest content should be tokenized should be tokenized by th...

Nltk installation

Hi In want to setup python's nltk library including wordnet in such a way that it can be easily copied from development system to production server, without having requirement for downloading wordnet separately. Any suggestion would be helpful... ...

An NLP project feedback

Hello, I am new to Natural Language Processing and I want to learn more by creating a simple project. NLTK was suggested to be popular in NLP so I will use it in my project. Here is what I would like to do: I want to scan our company's intranet pages; approximately 3K pages I would like to parse and categorize the content of these pa...