views:

98

answers:

4
+1  Q: 

search question

I'm working on a search module that searches in text columns that contains html code. The queries are constructed like: WHERE htmlcolumn LIKE '% searchterm %';

Default the modules searches with spaces at both end of the searchterms, with wildcards at the beginning and/or the end of the searchterms these spaces are removed (*searchterm -> LIKE '%searchterm %'; Also i've added the possibility to exclude results with certain words (-searchterm -> NOT LIKE '% searchterm %'). So far so good.

The problem is that words that that are preceded by an html-tag are not found (<br/>searchterm is not found when searching on LIKE '% searchterm.., also words that come after a comma or end with a period etc.). What i would like to do is search for words that are not preceded or followed by the characters A-Z and a-z. Every other characters are ok.

Any ideas how i should achieve this? Thanks!

+1  A: 

You could use a regular expression: http://dev.mysql.com/doc/refman/5.0/en/regexp.html

Davide Gualano
You know that when a reg ex is the solution to your problem you got 2 problems instead of 1 ;)
Overbeeke
But i have found solutions for both problems:WHERE column REGEXP '[^A-Za-z]key[^A-Za-z] works great!Thanks
Overbeeke
+1  A: 

I don't think SQL's "LIKE" operator alone is the right tool for the job you are trying to do. Consider using Lucene, or something like it. I was able to integrate Lucene.NET into my application in a couple days. You'll spend more time than that trying to salvage your current approach.

If you have no choice but to make your current approach work, then consider storing the text in two columns in your database. The first column is for the pure text, with punctuation etc. The second column is the text that has been pre-preprocessed, just words, no punctuation, normalized so as to be easier for your "LIKE" approach.

Corey Trager
+2  A: 

Look into MySQLs fulltextsearch, it might be able to use non-letter characters as delimiters. It will alsow be much much faster than a %term% search since that requires a full table-scan.

jishi
If the built-in fulltext search doesn't do quite what you need, it's not that difficult to write one of your own, either, and then you can customize it however you like.
Dave Sherohman
+1  A: 

Generally speaking, it is better to use full text search facilities, but if you really want a small SQL, here it is:

SELECT * FROM `t` WHERE `htmlcolumn` REGEXP '[[:<:]]term[[:>:]]'

It returns all records that contain word 'term' whether it is surrounded with spaces, punctuation, special characters etc

Incidently