views:

35

answers:

2

Thanks for reading this. Hope you can help me.

When I have a Mysql table with these row values

id| search
========
1| butterflies
2| america
3| birds of america
4| america butterflies

how can I tell which rows have all the words in column 'search' occuring in the string "butterflies of america", regardless of the number or order of the search words.

(I would like to retrieve 1,2 and 4 in this example) I now use a coded loop to solve this problem, would be nice to fix it more efficient with mysql. I tried full text search and regular expressions but are complety stuck. Tx.

A: 

SELECT * FROM table_name WHERE search LIKE '%butterflies%' AND search LIKE '%of%' AND search LIKE '%america%';

or

SELECT * FROM table_name WHERE search REGEXP 'butterflies|of|america'; // not working

If i am not missing something :)

Edit: I was missing something :(

Yasen Zhelev
Wow, quick response, thank you! I tried your suggestions. Your first SELECT only returns records 1 and 2, but I would like to retrieve 4 as well (both 'america' and 'butterflies' occur in the string). The second SELECT returns all records, but I do not want record 3 ('birds' does not occur in the string). Any thoughts on how to tweak the regexp?
wikkie
Now i realized what you want. Will come with solution soon hopefully :)
Yasen Zhelev
Ok, tried it again and now the query does not return any records; it asks for records that contain all the words (butterflies + of + america), and none of the records contain all the words. Changing the AND to OR does no good (returns all records again).
wikkie
I do not think that this could be done with one query. A stored procedure may do the work or just use the script to look for the right records.
Yasen Zhelev
ah, our comments crossed, okay I wait :-)
wikkie
right, a script is handling this at the moment but taking more time every day. Thanks for your help, if anyone sees a solution, please let me know!
wikkie
A: 

Here is one method that I experimented with (although not very efficient):

select search, replace(filtered, 'butterflies', '') as filtered from (
   select search, replace(filtered, 'of', '') as filtered from (
      select search, replace(search, 'america', '') as filtered from table_name a
   ) b
) c;

This query will give you something like the following:

+---------------------+----------+
| search              | filtered |
+---------------------+----------+
| butterflies         |          |
| america             |          |
| birds of america    | birds    |
| america butterflies |          |
+---------------------+----------+

The last piece to make this work was giving me some trouble, though... you need a where clause which will return all rows that are "empty" (i.e. contain only whitespace characters).

That will filter out the third row and return the result set you desire. However, I wasn't able to get this to work using trim() and I don't know why.

For example, I tried:

where length(trim(c.filtered)) = 0;

This did not give me the result set I wanted. I don't have anymore time to look into this right now, but I wanted to mention this approach in case someone else wants to chime in and finish solving the puzzle. If not, I will try to look into this a little more later today or tomorrow.

Matt Caldwell
Hi Matt, I realy like your clever approach. But don't you think this method would produce a slow query? The nesting also will be dependent on code to produce a proper query. I still hope to find a solution independent of code. :-)
wikkie
I agree, it may not be the most performant query, but the performance would likely be acceptable. A persistence framework such as ibatis/mybatis (http://www.mybatis.org/java.html) could be used to generate dynamic queries such as this outside of code.
Matt Caldwell