I have a database with a listing of documents and the words within them. Each row represents a term. What I'm looking to do is to count how many documents a word occurs in.
So, given the following:
+ doc + word +
+-------+--------+
+ a + foo +
+-------+--------+
+ a + foo +
+-------+--------+
+ a + bar +
+-------+--------+
+ b + bar +
+-------+--------+
I'd get a result of
+ word + count +
+--------+---------+
+ foo + 1 +
+--------+---------+
+ bar + 2 +
+--------+---------+
Because foo occurs in only one document (even if it occurs twice within that doc) and bar occurs in two documents.
Essentially, what (think) I should be doing is a COUNT of the words that the following query spits out,
SELECT DISTINCT word, doc FROM table
..but I can't quite figure it out. Any hints?