Hi!
I have to hack a content management system to support fulltext search for a language that contains special characters. These are stored in the database as html entities. Out of the box, the CMS doesn't support it. The bug was reported long time ago, but apparently it has no priority. I'm stick to this CMS, the customer is awaiting my solution, so I have to hack it. Damn...
Ok... the CMS stores it's content by translating special characters into html entities (this is actualy done by the bundled editor). So the german word "möchten" gets "möchten" in the DB. The CMS creates a query string like
SELECT * FROM `SiteTree` WHERE MATCH( Content ) AGAINST (<SEARCH_STRING> IN BOOLEAN MODE);
The table is of type MyISAM, the field has a FULLTEXT index.
If you use "möchten" as search string, MySQL will match every page, as & is a operator that will do crazy things if it's present in the search string. The search will not work.
Next idea is to replace the special character by an * as placeholder. But this will also match several words, as soon as you have anything starting with an "m" and another following word ending with an "chten". I don't know why, but replacing only the ampersand with an asterisk (so searching for "m*ouml;chten") will also lead to similar results.
The same problem was described here.
Ok, folks, I need your help! Any ideas?
Edit: Converting the content to UTF-8 is no option.
Thanks!
craesh