tags:

views:

330

answers:

4

I have 2 tables, say table A and table B and I want to perform a join, but the matching condition has to be where a column from A 'is like' a column from B meaning that anything can come before or after the column in B:

for example: if the column in A is 'foo'. Then the join would match if column in B is either: 'fooblah', 'somethingfooblah', or just 'foo'. I know how to use the wildcards in a standard like statement, but am confused when doing a join. Does this make sense? Thanks.

+1  A: 

In MySQL you could try:

SELECT * FROM A INNER JOIN B ON B.MYCOL LIKE CONCAT('%', A.MYCOL, '%');

Of course this would be a massively inefficient query because it would do a full table scan.

Update: Here's a proof


create table A (MYCOL varchar(255));
create table B (MYCOL varchar(255));
insert into A (MYCOL) values ('foo'), ('bar'), ('baz');
insert into B (MYCOL) values ('fooblah'), ('somethingfooblah'), ('foo');
insert into B (MYCOL) values ('barblah'), ('somethingbarblah'), ('bar');
SELECT * FROM A INNER JOIN B ON B.MYCOL LIKE CONCAT('%', A.MYCOL, '%');
+-------+------------------+
| MYCOL | MYCOL            |
+-------+------------------+
| foo   | fooblah          |
| foo   | somethingfooblah |
| foo   | foo              |
| bar   | barblah          |
| bar   | somethingbarblah |
| bar   | bar              |
+-------+------------------+
6 rows in set (0.38 sec)
Asaph
Thanks..how would I achieve the same functionality but make it more efficient?
This is how you would do it. If you need it to be more efficient, you could index the MYCOL field on table B.
David Andres
If you're using the MyISAM table type, you could try a full text index and see if that helps. Generally though, full text search is not a strength of MySQL. If full text search is a core part of your application, consider something like Apache Lucene - http://lucene.apache.org/java/docs/
Asaph
Update: BTW: full text index would require a different query. See the MySQL docs here for details: http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
Asaph
+2  A: 

Using INSTR:

SELECT *
  FROM TABLE a
  JOIN TABLE b ON INSTR(b.column, a.column) > 0

Using LIKE:

SELECT *
  FROM TABLE a
  JOIN TABLE b ON b.column LIKE '%'+ a.column +'%'

Using LIKE, with CONCAT:

SELECT *
  FROM TABLE a
  JOIN TABLE b ON b.column LIKE CONCAT('%', a.column ,'%')

Mind that in all options, you'll probably want to drive the column values to uppercase BEFORE comparing to ensure you are getting matches without concern for case sensitivity:

SELECT *
  FROM (SELECT UPPER(a.column) 'ua'
         TABLE a) a
  JOIN (SELECT UPPER(b.column) 'ub'
         TABLE b) b ON INSTR(b.ub, a.ua) > 0

The most efficient will depend ultimately on the EXPLAIN plan output.

JOIN clauses are identical to writing WHERE clauses. The JOIN syntax is also referred to as ANSI JOINs because they were standardized. Non-ANSI JOINs look like:

SELECT *
  FROM TABLE a,
       TABLE b
 WHERE INSTR(b.column, a.column) > 0

I'm not going to bother with a Non-ANSI LEFT JOIN example. The benefit of the ANSI JOIN syntax is that it separates what is joining tables together from what is actually happening in the WHERE clause.

OMG Ponies
+1  A: 

If this is something you'll need to do often...then you may want to denormalize the relationship between tables A and B.

For example, on insert to table B, you could write zero or more entries to a juncion table mapping B to A based on partial mapping. Similarly, changes to either table could update this association.

This all depends on how frequently tables A and B are modified. If they are fairly static, then taking a hit on INSERT is less painful then repeated hits on SELECT.

David Andres
That's a fine solution, but it's not accurate to call it denormalization.
Bill Karwin
Fair enough. Call it a junction table then
David Andres