views:

134

answers:

4

So I have a database table in MySQL that has a column containing a string. Given a target string, I want to find all the rows that have a substring contained in the target, ie all the rows for which the target string is a superstring for the column. At the moment I'm using a query along the lines of:

SELECT * FROM table WHERE 'my superstring' LIKE CONCAT('%', column, '%')

My worry is that this won't scale. I'm currently doing some tests to see if this is a problem but I'm wondering if anyone has any suggestions for an alternative approach. I've had a brief look at MySQL's full-text indexing but that also appears to be geared toward finding a substring in the data, rather than finding out if the data exists in a given string.

+1  A: 

You could create a temporary table with a full text index and insert 'my superstring' into it. Then you could use MySQL's full text match syntax in a join query with your permanent table. You'll still be doing a full table scan on your permanent table because you'll be checking for a match against every single row (what you want, right?). But at least 'my superstring' will be indexed so it will likely perform better than what you've got now.

Alternatively, you could consider simply selecting column from table and performing the match in a high level language. Depending on how many rows are in table, this approach might make more sense. Offloading heavy tasks to a client server (web server) can often be a win because it reduces load on the database server.

Asaph
Yeah, I was going to look into that one. At the moment I'm setting up a test rig to get some performance numbers and see what's needed.
Benno
A: 

If your superstrings are URLs, and you want to find substrings in them, it would be useful to know if your substrings can be anchored on the dots.

For instance, you have superstrings :

www.mafia.gov.ru www.mymafia.gov.ru www.lobbies.whitehouse.gov

If your rules contain "mafia' and you want the first 2 to match, then what I'll say doesn't apply.

Else, you can parse your URLs into things like : [ 'www', 'mafia', 'gov', 'ru' ] Then, it will be much easier to look up each element in your table.

peufeu
I can do that in the hostname case but it's less useful in the whole-of-URL case. Good idea though.
Benno
A: 

Well it appears the answer is that you don't. This type of indexing is generally not available and if you want it within your MySQL database you'll need to create your own extensions to MySQL. The alternative I'm pursuing is to do the indexing in my application.

Thanks to everyone that responded!

Benno
A: 

I created a search solution using views that needed to be robust enought to grow with the customers needs. For Example:


CREATE TABLE tblMyData
(
MyId bigint identity(1,1),
Col01 varchar(50),
Col02 varchar(50),
Col03 varchar(50)
)

CREATE VIEW viewMySearchData 
as
SELECT 
MyId,
ISNULL(Col01,'') + ' ' +
ISNULL(Col02,'') + ' ' +
ISNULL(Col03,'') + ' ' AS SearchData
FROM tblMyData

SELECT 
t1.MyId,
t1.Col01,
t1.Col02,
t1.Col03
FROM tblMyData t1
INNER JOIN viewMySearchData t2
ON t1.MyId = t2.MyId
WHERE t2.SearchData like '%search string%'


If they then decide to add columns to tblMyData and they want those columns to be searched then modify viewMysearchData by adding the new colums to "AS SearchData" section.

If they decide that there are two many columns in the search then just modify the viewMySearchData by removing the unwanted columns from the "AS SearchData" section.

Cape Cod Gunny