views:

28

answers:

1

Hi - I'm having problems with case-sensitivity in MySQL FULLTEXT searches.

I've just followed the FULLTEXT example in the MySQL doco at http://dev.mysql.com/doc/refman/5.1/en/fulltext-boolean.html . I'll post it here for ease of reference ...

CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
title VARCHAR(200),
body TEXT,
FULLTEXT (title,body)
);

INSERT INTO articles (title,body) VALUES
('MySQL Tutorial','DBMS stands for DataBase ...'),
('How To Use MySQL Well','After you went through a ...'),
('Optimizing MySQL','In this tutorial we will show ...'),
('1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'),
('MySQL vs. YourSQL','In the following database comparison ...'),
('MySQL Security','When configured properly, MySQL ...');

SELECT * FROM articles
WHERE MATCH (title,body)
AGAINST ('database' IN NATURAL LANGUAGE MODE);

... my problem is that the example shows that SELECT returning the first and fifth rows ('..DataBase..' and '..database..') but I only get one row ('database') !

The example doesn't demonstrate what collation the table in the example had but I have ended up with latin1_general_cs on the title and body columns of my example table.

My version of MySQL is 5.1.39-log and the connection collation is utf8_unicode_ci .

I'd be really grateful is someone could suggest why my experience differs from the example in the manual !

Be grateful for any advice.

A: 

I guess that your default collation is case sensitive somewhere - seeing as you ended up with latin1_general_cs in your table. Perhaps in the start up?

You can check using

show variables like 'collation%'

Which for me gives:

+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci | 
| collation_database   | latin1_swedish_ci | 
| collation_server     | latin1_swedish_ci | 
+----------------------+-------------------+
3 rows in set (0.00 sec)

So the example works as advertised on my server.

The collation of columns in your table will default to the database, server, or table collation as appropriate. In other words, collation specified at the column level overrides any at the table level, which overrides database level, etc.

Column collation is specified using this syntax:

col_name {CHAR | VARCHAR | TEXT} (col_length)
    [CHARACTER SET charset_name]
    [COLLATE collation_name]

See §9.1 of the MySQL documentation for the gory details.

martin clayton
Great - thank you very much for explaining this. I've now recreated the table specifically setting the collation/character set as :CREATE TABLE articles (id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,title VARCHAR(200),body TEXT,FULLTEXT (title,body))DEFAULT CHARACTER SET latin1 COLLATE latin1_general_ci;... and the query now works as the examle suggests it should . Thanks very much for your help.
southof40