Are there any good alternatives to SQLite + FTS3 for python?
I'm iterating over a series of text documents, and would like to categorize them according to some text queries. For example, I might want to know if a document mentions the words "rating" or "upgraded" within three words of "buy." The FTS3 syntax for this query is the following:
(rating OR upgraded) NEAR/3 buy
That's all well and good, but if I use FTS3, this operation seems rather expensive. The process goes something like this:
# create an SQLite3 db in memory
conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('CREATE VIRTUAL TABLE fts USING FTS3(content TEXT)')
conn.commit()
Then, for each document, do something like this:
#insert the document text into the fts table, so I can run a query
c.execute('insert into fts(content) values (?)', content)
conn.commit()
# execute my FTS query here, look at the results, etc
# remove the document text from the fts table before working on the next document
c.execute('delete from fts')
conn.commit()
This seems rather expensive to me.
The other problem I have with SQLite FTS is that it doesn't appear to work with Python 2.5.4. The 'CREATE VIRTUAL TABLE' syntax is unrecognized. This means that I'd have to upgrade to Python 2.6, which means re-testing numerous existing scripts and programs to make sure they work under 2.6.
Is there a better way? Perhaps a different library? Something faster? Thank you.