tags:

views:

88

answers:

4

I have been trying to learn Python for a while now. By chance, I happened across chapter 6 of the official tutorial through a Google search link pointing here. When I learned, from that page, that functions were the heart of modules, and that modules could be called from the command line, I was all ears. Here's my first attempt at doing both, openbook.py

import nltk, re, pprint
from __future__ import division

def openbook(book):
    file = open(book)
    raw = file.read()
    tokens = nltk.wordpunct_tokenize(raw)
    text = nltk.Text(tokens)
    words = [w.lower() for w in text]
    vocab = sorted(set(words))
    return vocab
if __name__ == "__main__":
    import sys
    openbook(file(sys.argv[1]))

What I want is for this function to be importable as the module openbook, as well as for openbook.py to take a file from the command line and do all of those things to it.

When I run openbook.py from the command line, this happens:

gemeni@a:~/Projects-FinnegansWake$ python openbook.py vicocyclometer
Traceback (most recent call last):
  File "openbook.py", line 23, in <module>
    openbook(file(sys.argv[1]))
  File "openbook.py", line 5, in openbook
    file = open(book)

When I try using it as a module, this happens:

>>> import openbook
>>> openbook('vicocyclometer')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'module' object is not callable

So, what can I do to fix this, and hopefully continue down the long winding path to enlightenment?

A: 

Try

from openbook import *

instead of

import openbook

OR:

import openbook

and then call it with

openbook.openbook("vicocyclometer")
Jared Updike
+6  A: 
John Kugelman
@John Kugelman ok, thanks for the info. It worked when I just took out the if statement at the bottom. What I was trying to do was make the Python runnable as a command line script with optional arguments, as in chapter 6 of the Python tutorial: http://docs.python.org/tutorial/modules.html#executing-modules-as-scripts
old Ixfoxleigh
A: 

In your interactive session, you're getting that error because you need to from openbook import openbook. I can't tell what happened with the command line because the line with the error got snipped. It's probably that you tried to open a file object. Try just passing the string into the openbook function directly.

Nathon
+1  A: 

Here are some things you need to fix:

  1. nltk.word_tokenize will fail every time:
    • The function takes sentences as arguments. Make sure that you use nltk.sent_tokenize on the whole text first, so that things work correctly.
  2. Files not being dealt with:
    • Only open the file once.
    • You're not closing the file once it's done. I recommend using Python's with statement to extract the text, as it closes things automatically: with open(book) as raw: nltk.sent_tokenize(raw) ...
  3. Import the openbook function from the module, not just the module: from openbook import openbook.

Lastly, you could consider:

  1. Adding things to the set with a generator expression, which will probably reduce the memory load: set(w.lower() for w in text)
  2. Using nltk.FreqDist to generate a vocab & frequency distribution for you.
Tim McNamara
Thanks for all of this input. Answers like yours are hard to come by on SE sometimes. As for your fifth point, there's not much point in using FreqDist on a book in which single words appear more than once.
old Ixfoxleigh
If you are just testing vocab, then no, the frequency distribution isn't required. It can be useful for further analysis though.
Tim McNamara