views:

600

answers:

2

This page

has a great example using REGEXP to do pattern matching. the problem with REGEXP won't match the following strings:

  • "Mr John"
  • "Dr. John"
    or even:
  • "Mr. John Doe"

with the string "John Doe"

I would like to know how do I get positive matches for any of the given examples?

Here is a sample code:


Drop table Names;

CREATE TABLE Names (
    first_name VARCHAR(20), 
    last_name  VARCHAR(20)

);

INSERT INTO  Names VALUES ('John','Doe');
INSERT INTO  Names VALUES ('Sue','Yin');
INSERT INTO  Names VALUES ('Diego James', 'Franco');

select * from Names;

/*To find names containing a string */
/*I want this to march John Doe*/
SELECT * FROM Names WHERE first_name REGEXP 'Mr John';
/*This has John misspelled, I want it to match John Doe */
SELECT * FROM Names WHERE first_name REGEXP 'Hohn' AND last_name REGEXP 'Doe';
/*And this would match Diego James Franco*/
SELECT * FROM Names WHERE first_name REGEXP 'Dr Diego' AND last_name REGEXP 'Franco';

-Thank you

UPDATE: Thank you for the answers, the question is not how to use regular expression to do the matching that I want, but rather how can I do it regardless of REGEXP. I use REGEXP as an example of pattern matching. I do appreciate the clarification on regular expressions.

A: 

the string John Doe should match is the last one. Can you please post the exact sql and the data it's trying to match

Ok, so you have the string and pattern mixed up you supply a pattern to regexp intending to match a string. So for example Dr. John will never match John since the pattern tries to match Dr and fails. However John will match Dr John since the pattern now find John within Dr John. My suggestion to you is to read a regular expressions primer.

ennuikiller
Added the code and some sample queries that return no rows that I would like to have positive matches.Thank you
Onema
I do understand that this is the purpose of REGEXP. My question is: how do I get positive matches for this examples by using REGEXP or any other comparison?
Onema
+1  A: 

Regular expressions are not meant to match inexact strings (for example, a spelling error). It seems like that is what you are trying to do. A regular expression could be used, for example, to match any social security number (three digits followed by a hyphen followed by two digits followed by another hyphen followed by four digits). But you couldn't use a regular expression to match misspellings of John. Misspellings are handled using some fancier coding, usually called n-gram matching (see: http://en.wikipedia.org/wiki/N-gram). If you are also using Ruby-on-Rails, there is a great Gem (called Chrononaut-no_fuzz) to handle this for you, but with plain MySQL you may have to hand-code this feature.

sneakerfish
I am using PHP 5 and MySQL
Onema
After reading a little bit about n-grams I am positive that is the way to go. I found this site http://boxoffice.ch/pseudo/code_expl/code.php it has an amazing example of n-grams implemented using php.
Onema