tags:

views:

96

answers:

5

I want to join two tables on a char column. The simplest way I use is:

SELECT * FROM a JOIN b ON (a.text = b.text)

This method is quite fast but it the comparison of field is case insensitive. I have tried STRCMP(.., ..) and .. LIKE .. instead .. = .. but both are far to slow. What is the best solution to join two tables on char column with respect to the letter case?

+2  A: 

don’t join on strings, use surrogate keys instead

knittl
Every entry has its own surrogate key. However, couple entries can have the same text and I need to join using that text. I already have the database and I cannot redesign it.
czuk
+2  A: 

I've no possibility to run a benchmark, but have you tried:

SELECT * FROM a JOIN b ON (BINARY a.text = BINARY b.text)

EDIT

Just as a sidenote: When using the BINARY operator both columns a.text and b.text must use the same character set as comparison is done on the byte-level.

Stefan Gehrig
That's a good idea!
Mitch Wheat
It works, but it took 8 min 0.73 sec, while the simple comparison only 2 sec. It is strange for me. Is there a solution that is as fast as insensitive comparison?
czuk
The problem is that even if you've indexed both columns correctly, these indexes won't be used as you apply an operator to the column content (same as applying a function). In my opinion you're stuck here if you don't have a chance to modify the database schema at all.
Stefan Gehrig
A: 

You could create your tables using certain character sets and collations (f.e. latin1_bin) that are case sensitive. Just look at the create table syntax. in this case the = operator should be as fast as on standard settings (like f.e. latin1 and latin1_swedish_ci).

Ralph Rickenbach
+2  A: 

If that text columns are always compared case-sensitively, give them a case sensitive collation in the schema (VARCHAR BINARY is one way of doing that) and your original query will work.

Now give them each an index to make the joins fast, if they're not already keys.

bobince
case sensitive collations is a good idea!
knittl
A: 

First of all, check indexes on these columns. This join takes fractions of a second on thousands of rows in both tables, provided that the fields are indexed properly.

Second, make sure you use same collations on both table. If you don't, specify the collation for the table column you want to be leading in the join.

Note that collation conversion makes condition not sargable and the indexes not usable. If you want case sensitive comparison to be fast, make sure your collation is set to binary (like UTF8_BIN) in both fields.

Quassnoi