views:

161

answers:

1

So i have a UNIQUE CHAR(255) column, i want to find a particular row,

Would it make sense to create CHAR(10) INDEX to make search more efficient? I know a unique is also an index

the engine will scan throught the index to where the 1st letter is J, then JO, then JOH but an index of 255 bytes x 1 million records, is a lot of memory "space" to scan, instead of just 10 bytes x 1 million

mail_sub = LEFT(mail,10)
mail_sub = substr(mail,10)

`CREATE TABLE pwd(  
  id       INT,   
  mail_sub CHAR(10) NOT NULL,   
  mail     CHAR(255) NOT NULL,  
  pw_hash  CHAR(32) NOT NULL, 
 PRIMARY KEY (id),  
 UNIQUE KEY  ind_email (mail),  
 INDEX       rv_sub (mail_sub,id)  
) ENGINE = INNODB CHARACTER SET latin1;`

(id is NOT auto_increment, its defined before insert)

The table is read in as many
SELECT * FROM pwd WHERE email='[email protected]';
as
SELECT * FROM pwd WHERE id=12345;

So "id" or "mail" could be the primary key, i dont really see any difference;

My question is, something like

`SELECT * FROM pwd WHERE mail_sub='abcde12345' AND 
email='[email protected]';`  

would make search more efficient ?

optimizer insists in using "ind_email"
FORCE / IGNORE indexes is no use, according to docs mysql index hints are silently ignored for string types (http://dev.mysql.com/doc/refman/5.1/en/index-hints.html)

i thought to use JOIN to make a search on mail_sub first, but to no success

`EXPLAIN EXTENDED   
SELECT a.pw_hash FROM pwd as a   
JOIN pwd as b ON b.id=a.id  
WHERE a.mail_sub='abcde12345' AND b.mail='abc...john.com';`

what are your thoughts on this?

thanks!

A: 

Add a column that is a hash of the email address. Then use as a predicate where a.hash = hash('[email protected]') and a.email = '[email protected]'

As it happens, mysql includes a hashing function, called password() that produces 16 byte hashes.

tpdi
thanks for the suggestion ----- yes, I though of using MD5() wich is 16 byte (32 hex), password() is actually 20 byte (40 hex char) ----- but could i use ONLY a hash in the predicate? --- you wrote `and a.email = '[email protected]'` so I assume that we can NOT trust the hash to be unique, although mysql would ensure uniqueness during INSERTion, i could end up telling the user an email address is already registered, when its not, or am i too paranoid ?
Luxvero
also when i tryed it earlier using `where a.hash = hash('[email protected]') and a.email = '[email protected]'` mysql keeps using the `email` index ---- or are you suggesting using the hash column Not as unique, just and index, hence the "mail=xxx" to filter the rows matching the index? and just forget about enforcing mail to be unique via sql constraints?
Luxvero