views:

386

answers:

2

If i wants count the matching words in a rows of two tables, with milions of rows, sample:

Table posts, sample:

+----+---------+-----------------------------+
| ID | ID_user | text                        | 
+----+---------+-----------------------------+
| 1  | bruno   | michael jackson is dead     |
| 2  | thomasi | michael j. moonwalk is dead |
| 3  | userts  | michael jackson lives       |
+----+---------+-----------------------------+

i want query the words most repeated on the table, limit top 10, the result may be this:

+-------+------------+
| count | word       |
+-------+------------+
| 3     | michael    |
| 2     | dead       |
| 2     | jackson    |
| 1     | j.         |
| 1     | lives      |
| 1     | moonwalk   |
+-------+------------+

but i want search only words that repeat more of 10 times, in this case noone word is appear, but if criteria for repetead words is 2, will display only 'michael' and 'dead', but ignore 'is' because i dont want words with less 2 chars of lenght, and the words that a phrase, then i need apear this:

+-------+-----------------+
| count | word            |
+-------+-----------------+
| 2     | michael jackson |
| 2     | dead            |
+-------+-----------------+

i need a code in mysql that replies the "trending topics" of twitter for posts of my site.

+2  A: 

What you're looking for is term extraction, which isn't provided natively within MySQL.

Some other platforms provide that function, but it's considered an enterprise feature, so you'll have to pay through the nose for it.

Alternatively, you can use something like Yahoo!'s Term Extraction API.

Here is a blog post that talks about using Yahoo!'s service from PHP5.

Jeremy Smyth
A: 

break the sentence up on insert, filter the words against a blacklist, store distinct words with a count (or probably with references). count using count() :)

this would generate a lot of data tough, and i don't know what the speed and storage implications are.

Schnalle