views:

23

answers:

2

Hi, I'm having the following problem with 2 MySQL tables that have a relation: I can easily query table 1 (address) when I want a full list or filter the result by name or email or such. But now I need to query table 1 and filter it based on the relational content of table 2 (interests). So, I need to find a row (usually many rows) in table 1 only if a (or more) conditions are met in table 2.

Here are the tables:

CREATE TABLE IF NOT EXISTS `address` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
  `email` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
  `countryCode` char(2) COLLATE utf8_unicode_ci DEFAULT NULL,
  `languageCode` char(2) COLLATE utf8_unicode_ci DEFAULT NULL,
  `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `emailUnique` (`email`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

INSERT INTO `address` (`id`, `name`, `email`, `countryCode`, `languageCode`, `timestamp`) VALUES
(1, '', '[email protected]', 'BE', 'nl', '2010-07-16 14:07:00'),
(2, '', '[email protected]', 'BE', 'fr', '2010-07-16 14:10:25');

CREATE TABLE IF NOT EXISTS `interests` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `address_id` int(11) unsigned NOT NULL,
  `cat` char(2) COLLATE utf8_unicode_ci NOT NULL,
  `subcat` char(2) COLLATE utf8_unicode_ci NOT NULL,
  `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  KEY `address_id` (`address_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

INSERT INTO `interests` (`id`, `address_id`, `cat`, `subcat`, `timestamp`) VALUES
(1, 1, 'aa', 'xx', '2010-07-16 14:07:00'),
(2, 1, 'aa', 'yy', '2010-07-16 14:07:00'),
(3, 2, 'aa', 'xx', '2010-07-16 14:07:00'),
(4, 2, 'bb', 'zz', '2010-07-16 14:07:00')
(5, 2, 'aa', 'yy', '2010-07-16 14:07:00');

ALTER TABLE `interests`
  ADD CONSTRAINT `interests_ibfk_1` FOREIGN KEY (`address_id`) REFERENCES `address` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION;

For example, I need to find the address(es) that has (have) as interest cat=aa and subcat=xx. Or, another example, I need the address(es) with as interest both cat=aa and subcat=xx AND cat=aa and subcat=yy. Specially the latter is important and one has to keep in mind that both the address and the interest tables will be long lists and that the amount of cat/subcat combinations will vary. I'm working with reference queries through Zend_Db_Table (findDependentRowset) at the moment but that solution is way to slow for address lists numbering 100s and even 1000s of hits.

Thank you for your help.

+2  A: 
SELECT a.name FROM address a
INNER JOIN interests i ON (a.id = i.address_id)
WHERE i.cat = "aa" AND i.subcat IN ('xx', 'yy')
Zane Edward Dockery
Neat and fast, thanks! If I'm seeing this right, this will result in all the addresses that have either the aa/xx OR the aa/yy combo. But what if I want the ones with aa/xx AND aa/yy?How to go about throwing another cat in the mix? Would this be the proper way:SELECT a.name FROM address aINNER JOIN interests i ON (a.id = i.address_id)WHERE ( i.cat = "aa" AND i.subcat IN ('xx', 'yy') )OR ( i.cat = "bb" AND i.subcat IN ('zz','ww') )
Peter
+1  A: 

I added another row in your interests table, to demonstrate a different result set between the two examples:

INSERT INTO interests VALUES (6, 2, 'aa', 'vv', '2010-07-16 14:07:00');

Then you may want to try using correlated subqueries as follows:

SELECT * 
FROM   address a 
WHERE  EXISTS (SELECT id 
               FROM   interests 
               WHERE  address_id = a.id AND 
                      (cat = 'aa' and subcat = 'xx'));

Result:

+----+------+--------------------+-------------+--------------+---------------------+
| id | name | email              | countryCode | languageCode | timestamp           |
+----+------+--------------------+-------------+--------------+---------------------+
|  1 |      | [email protected]     | BE          | nl           | 2010-07-16 14:07:00 |
|  2 |      | [email protected] | BE          | fr           | 2010-07-16 14:10:25 |
+----+------+--------------------+-------------+--------------+---------------------+
2 rows in set (0.00 sec)

For the second example, we're testing for the new row added previously in order not to have the same result as above:

SELECT * 
FROM   address a 
WHERE  EXISTS (SELECT id 
               FROM   interests 
               WHERE  address_id = a.id AND 
                      (cat = 'aa' and subcat = 'xx')) AND
       EXISTS (SELECT id 
               FROM   interests 
               WHERE  address_id = a.id AND 
                      (cat = 'aa' and subcat = 'vv'));

Result:

+----+------+--------------------+-------------+--------------+---------------------+
| id | name | email              | countryCode | languageCode | timestamp           |
+----+------+--------------------+-------------+--------------+---------------------+
|  2 |      | [email protected] | BE          | fr           | 2010-07-16 14:10:25 |
+----+------+--------------------+-------------+--------------+---------------------+
1 row in set (0.00 sec)

Using correlated subqueries is easy and straightforward. However keep in mind that it might not be the best in terms of performance, because the correlated subqueries will be executed once for each address in the outer query.

Daniel Vassallo
Cheers for your elaborate answer. In a way, this is what I've been doing through Zend. For each address check if the condition is met through the findDependentRowset. But the tables are too large for this, or better said, the total of the queries takes to long.
Peter
Correction, this is remarkably faster than the findDependentRowset. I've run a few tests, both with AND and OR constructions and they show a quite workable performance, at least on my current development DB. I'm off to a meeting now but will test this further later on. This might very well be the solution for me.
Peter