views:

494

answers:

1

Are there any good alternatives to SQLite + FTS3 for python?

I'm iterating over a series of text documents, and would like to categorize them according to some text queries. For example, I might want to know if a document mentions the words "rating" or "upgraded" within three words of "buy." The FTS3 syntax for this query is the following:

(rating OR upgraded) NEAR/3 buy

That's all well and good, but if I use FTS3, this operation seems rather expensive. The process goes something like this:

# create an SQLite3 db in memory
conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('CREATE VIRTUAL TABLE fts USING FTS3(content TEXT)')
conn.commit()

Then, for each document, do something like this:

#insert the document text into the fts table, so I can run a query
c.execute('insert into fts(content) values (?)', content)
conn.commit()

# execute my FTS query here, look at the results, etc

# remove the document text from the fts table before working on the next document
c.execute('delete from fts')
conn.commit()

This seems rather expensive to me.

The other problem I have with SQLite FTS is that it doesn't appear to work with Python 2.5.4. The 'CREATE VIRTUAL TABLE' syntax is unrecognized. This means that I'd have to upgrade to Python 2.6, which means re-testing numerous existing scripts and programs to make sure they work under 2.6.

Is there a better way? Perhaps a different library? Something faster? Thank you.

+1  A: 

I suggest you install pysqlite2 module separately. You should make sure that you have sqlite3 installed in your system which should have the FTS3 module ;-)

Checkout http://code.google.com/p/pysqlite/downloads/list for the latest code (as of April 13, it's version 2.6.0). It's the usual setuptools build. It should also complain if you don't have sqlite3 installed -- or its libraries aren't installed in the usual location /usr/lib(64). In the latter case, update your LD_LIBRARY_PATH and perhaps tweak setup.cfg to specify the header/lib paths.

Goodluck, Paul

Paul Rigor