views:

1014

answers:

3

I have been trying to make the NLTK (Natural Language Toolkit) work on the Google App Engine. The steps I followed are:

  1. Download the installer and run it (a .dmg file, as I am using a Mac).
  2. copy the nltk folder out of the python site-packages directory and place it as a sub-folder in my project folder.
  3. Create a python module in the folder that contains the nltk sub-folder and add the line: from nltk.tokenize import *

Unfortunately, after launching it I get this error (note that this error is raised deep within NLTK and I'm seeing it for my system installation of python as opposed to the one that is in the sub-folder of the GAE project):

 <type 'exceptions.ImportError'>: No module named nltk
Traceback (most recent call last):
  File "/base/data/home/apps/xxxx/1.335654715894946084/main.py", line 13, in <module>
    from lingua import reducer
  File "/base/data/home/apps/xxxx/1.335654715894946084/lingua/reducer.py", line 11, in <module>
    from nltk.tokenizer import *
  File "/base/data/home/apps/xxxx/1.335654715894946084/lingua/nltk/__init__.py", line 73, in <module>
    from internals import config_java
  File "/base/data/home/apps/xxxx/1.335654715894946084/lingua/nltk/internals.py", line 19, in <module>
    from nltk import __file__

Note: this is how the error looks in the logs when uploaded to GAE. If I run it locally I get the same error (except it seems to originate inside my site-packages instance of NLTK ... so no difference there). And "xxxx" signifies the project name.

So in summary:

  • Is what I am trying to do even possible? Will NLTK even run on the App Engine?
  • Is there something I missed? That is: copying "nltk" to the GAE project isn't enough?

EDIT: fixed typo and removed unnecessary step

+3  A: 

NLTK, I believe, does try its best to be pure-Python as a fallback (graceful degradation) when it can't have the C-coded accelerator extensions it would like. However one always needs to be moving with utter care to boldly inject such a rich package (recursively zipping up all of the .py files and using zipimport might be less flaky).

My installed NLTK, 0.95 I believe, has no ntlk.tokenizer -- it does have an nltk.tokenize, no trailing R, but obviously even the most minute such typo is 100% intolerable when you're trying to tell a computer exactly what you want, so I assume this is not a typo on your part but rather your use of a completely different and incompatible release of NLTK, so, WHAT release is it that has a subpackage named tokenizer rather than tokenize?

If you find a zero-tolerance policy for one-char typos hard to bear, computers and their programming are unlikely to be tolerable to you...;-)

Alex Martelli
Ah, ok, a mistake on my part. But, this is a red-herring (which I would have likely discovered if it weren't for being able to import *ANY* of NLTK) :-) So, why is it that I need to use zipimport? I actually haven't had to do with this with a python library before. thanks.
Ryan Delucchi
You don't NEED to use zipimport -- it's just a convenience to make sure you have all the .py files from a package into a single .zip file with nothing left behind or overlooked; since you have limits to the number of files -). Do specify exact versions if you want help tho!-)
Alex Martelli
2.0b5. Once again, I am seeing the same error both on my local machine (running within the GAE dev. environment) in addition to Google App Engine.
Ryan Delucchi
+4  A: 

oakmad has managed to successfully work through deploying SEVERAL NLTK modules to GAE. Hope this helps. But , but be honest, I still don't think it's true even after read the post.

sunqiang
Thanks for the link. This gave me some good hints (although I don't think it's the *complete* solution to the problem).
Ryan Delucchi
+1  A: 

The problem here is that nltk is attempting to do recursive imports: When nltk/init.py is imported, it imports nltk/internals.py, which then attempts to import 'nltk' again. Since nltk is in the middle of being imported itself, it fails with a (rather unhelpful) error. Whatever they're doing is pretty weird anyway - it's unsurprising something like "from nltk import file" breaks.

This looks like a problem with nltk itself - does it work when imported directly from a Python console? If so, they must be doing some sort of trickery in the installed version. I'd suggest asking on the nltk groups what they're up to and how to work around it.

Nick Johnson
Yes! It all seems to come down to NLTK's wacky importing. And yes, it does work fine on the console. The solution must involve going in all the references to "nltk" and fixing them. This is however non-trivial because there seems to also be issues referring to other packages as well. So, preferably: it would be nice to have a general way to resolve all the annoying importing issues.
Ryan Delucchi
Perhaps if you ask the NLTK people what their intent is with the weird recursive imports, we can figure out a way to make it work.
Nick Johnson