Hi,
In the link you sent it says this function is feature extractor that simply checks whether each of these words is present in a given document.
Here is the whole code with numbers for each line:
1 all_words = nltk.FreqDist(w.lower() for w in movie_reviews.words())
2 word_features = all_words.keys()[:2000]
3 def document_features(document):
4 document_words = set(document)
5 features = {}
6 for word in word_features:
7 features['contains(%s)' % word] = (word in document_words)
8 return features
In line 1 it created a list of all words.
In line 2 it takes the most frequent 2000 words.
3 the definition of the function
4 converts the document list (I think it must be a list) and converts the list to a set.
5 declares a dictionary
6 iterates over all of the most frequent 2000 words
7 creates a dictionary where the key is 'contains(theword)' and the value is either true or false. True if the word is present in the document, false otherwise
8 returns the dictionary which is shows whether the document contains the most frequent 2000 words or not.
Does this answer your question?