views:

71

answers:

3

I have a lot of data that is being entered into records with the HTML entity &. A full-text search for the word "amp" will result in records containing & to be shown, which is highly undesirable.

Presumably this is because MySQL ignores the '&' and the ';'. So does anyone know of any way within MySQL to force it to treat special characters as part of the word so that my search for "amp" doesn't include all results with & in them - ideally without some form of subquery or extra WHERE clause?

My solution so far (not yet implemented) is to decode the entities on INSERT and re-encode them when displaying on the web. This would be ok, but adds some overhead to everything that I'd like to avoid if possible. Also it works well for new entries, but I would need to backdate it to nearly 7 million records... which I kinda don't want to have to do if I can help it.

--

I updated my my.cnf file with the following:

ft_stopword_file = /etc/mysql/custom-stopwords

Does there need to be any special permissions on this file?

A: 

perhaps you need to specifically ignore these. try to include -& to your fulltext query. Another option and I am unsure if it requires a MySql source code change is to add amp and & to the stop words list of MySql

Ashley
scrumpyjack
scrumpyjack
A: 

You added it to the stopwords file and it's not working? Sounds like either a bug in MySQL or your stopwords list isn't being used. Have you reviewed this? Quote:

False hits or misses may occur for stopword lookups if the stopword file or columns used for full-text indexing or searches have a character set or collation different from character_set_server or collation_server.

Case sensitivity of stopword lookups depends on the server collation. For example, lookups are case insensitive if the collation is latin1_swedish_ci, whereas lookups are case sensitive if the collation is latin1_general_cs or latin1_bin.

Could any of those possibility be impacting your stopword entry of & not being read?

Joshua Beall
And here's also this description of the my.cnf config value that points to the stopword file. It would be good to review that to ensure you haven't missed anything in setting up your stopword list.http://dev.mysql.com/doc/refman/5.1/en/server-system-variables.html#sysvar_ft_stopword_file
Joshua Beall
Thanks Joshua. I followed most of this to the letter after the Ashley's suggestion and I haven't been successful. I've updated my question
scrumpyjack