views:

432

answers:

4

Hi,

I have a table with a varchar(255) field. I want to get (via a query, function, or SP) the number of occurences of each word in a group of rows from this table.

If there are 2 rows with these fields:

"I like to eat bananas"
"I don't like to eat like a monkey"

I want to get

    word | count()
    ---------------
    like  3
    eat   2
    to    2
    i     2
    a     1

Any idea? I am using MySQL 5.2.

+1  A: 

I would recommend not to do this in SQL at all. You're loading DB with something that it isn't best at. Selecting a group of rows and doing frequency calculation on the application side will be easier to implement, will work faster and will be maintained with less issues/headaches.

Alex N.
Really, isn't it smarter to run it as a stored procedure? or a function?.i always thought that offloading such calculations to the DB is better.
Elad Meidar
Agreed, there is no way of doing this using regular SQL. You could write a stored procedure to do it using a temporary table and a modified version of the one of the techniques in this article: http://www.sommarskog.se/arrays-in-sql-2000.html (this is for SQL Server 2000 but could be adapted to MySQL's dialect fairly easily) However, the issue still is that you're going outside SQL's "comfort zone" in doing so
Ken Keenan
You certainly can do that in SP, using multiple passes over data set and extracting each word with regexp, then putting results in temp table and iterating over it and so on. But it is quite complicated, it will be both difficult to write and maintain, and I would said maintenance is even more important then writing it in the first place. DBs are good for what they designed for - storing/reading/writing relational data. Making them compute things are just suboptimal design and will eventually return as a lot of PITA.
Alex N.
A: 

I think you are trying to do too much with SQL if all the words are in one field of each row. I recommend to do any text processing/counting with your application after you grab the text fields from the db.

Troggy
Really, isn't it smarter to run it as a stored procedure? or a function?.i always thought that offloading such calculations to the DB is better.
Elad Meidar
A: 

I think you should do it like indexing, with additional table. Whenever u create, update, or delete a row in your original table, you should update your indexing table. That indexing table should have the columns: word, and the number of occurrences.

Yeah, but the problem is i need in in a scope of a user (FK on the table with the string i need to parse) using a join table between a word, a phrase and a user seems a bit of over kill no?
Elad Meidar
A: 

You can try this perverted-a-little way:

SELECT 
(LENGTH(field) - LENGTH(REPLACE(field, 'word', ''))) / LENGTH('word') AS `count`
ORDER BY `count` DESC

This query can be very slow. Also, it looks pretty ugly.

Slava Popov