views:

32

answers:

1

Hi there

I am using this query

SELECT COUNT(DISTINCT to_number) AS errors FROM sent_txts
WHERE msg_link_id = 0 AND 
msg_sent_datetime BETWEEN '2010-08-09 07:00:00' AND '2010-09-07 11:59:59'
HAVING to_number IN(SELECT mobile FROM action_6_members WHERE mobile = to_number)

However I am getting

Unknown column 'tada_prod.sent_txts.to_number' in 'where clause'

If I comment out the having the query runs fine.

Here is the sent_txts table

CREATE TABLE `sent_txts` (
  `id` int(64) NOT NULL AUTO_INCREMENT,
  `msg_link_id` int(64) DEFAULT NULL,
  `msg_class` varchar(256) DEFAULT NULL,
  `msg_ref` varchar(256) DEFAULT NULL,
  `to_number` varchar(256) DEFAULT NULL,
  `msg_body` text,
  `waiting_for_reply` int(64) DEFAULT NULL,
  `status` varchar(256) DEFAULT NULL,
  `tada_error` int(64) DEFAULT NULL,
  `msg_sent_datetime` datetime DEFAULT NULL,
  `reply_type` int(11) DEFAULT '0',
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=256379 DEFAULT CHARSET=utf8

So obviously the column is there.

What is the issue?


Here is the explain of the query posted by Jonathan

id  select_type  table  type  possible_keys  key  key_len  ref  rows    Extra                           
1   SIMPLE       a      ALL                                     10895                                   
1   SIMPLE       s      ALL                                     256050  Using where; Using join buffer  

Here is the show create for action_6_members:

CREATE TABLE `action_6_members` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `draw_id` int(11) NOT NULL,
  `mobile` varchar(255) NOT NULL,
  `fly_buys` varchar(255) NOT NULL,
  `signup_date` datetime NOT NULL,
  `club` int(11) NOT NULL DEFAULT '0' COMMENT '1 = yes, 2 = no',
  `code` varchar(7) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=10896 DEFAULT CHARSET=latin1
+1  A: 

Normally, a HAVING clause relates an aggregate to a value or another aggregate. What is in the HAVING clause of the question should be in the main WHERE clause:

SELECT COUNT(DISTINCT to_number) AS errors
  FROM sent_txts AS s
 WHERE msg_link_id = 0
   AND msg_sent_datetime BETWEEN '2010-08-09 07:00:00'
                             AND '2010-09-07 11:59:59'
   AND to_number IN (SELECT mobile FROM action_6_members AS a
                      WHERE a.mobile = s.to_number)

But you should probably make that into a regular join:

SELECT COUNT(DISTINCT s.to_number) AS errors
  FROM sent_txts AS s
  JOIN action_6_members AS a ON a.mobile = s.to_number
 WHERE s.msg_link_id = 0
   AND s.msg_sent_datetime BETWEEN '2010-08-09 07:00:00'
                               AND '2010-09-07 11:59:59';
Jonathan Leffler
+1: You beat me; I was working on an example using EXISTS. The JOIN to `action_6_members` risks inflating records if there's more than one child associated to a parent record.
OMG Ponies
@OMG: but the DISTINCT undoes any of that damage, does it not? Also, I think the 'Action 6' people are on a special (mobile phone) rate plan - and this is counting the number of calls sent between the two dates to people on the Action 6 plan - so there will only be one number (at most) in the Action 6 table for any given 'to_number'.
Jonathan Leffler
hmm, dont really like joins but ok :) Just incase, this query should basically select all unique mobile(to_number) from the sent_txts table(based on the where) but only count it if the number exists in the action_6_members, can you see a better way?
Hailwood
hmm, trying the query above locks up my sql client. (could it be that it is a 30k row table joining a 10k row table?)
Hailwood
@Hailwood: There's a very decent chance that the join will be more efficient than the IN sub-query (unless the optimizer is smart enough to convert the IN into a join of its own accord), unless the number of people in the Action 6 table is very small (say, under 100). And joins are usually effective - they are, after all, the bread and butter of an optimizer. If your DBMS can't manage joins efficiently, get a better DBMS (meaning one that *can* manage joins efficiently).
Jonathan Leffler
@jonothan not quite, a mobile may be in action_6 multiple times.
Hailwood
@Hailwood: what indexes do you have on sent_txts and action_6_members? What does the query plan say?
Jonathan Leffler
@Jonothan yeah, I kinda learnt to hate joins when I was working with joining four tables with some having 3Million+ rows
Hailwood
@Hailwood: please note that schema and cardinality information is helpful. Both the gross counts and the relationships such as 'a single mobile number may appear several times in action_6_members'. Since you were asking how to do it, is it safe to assume you didn't have a working query and you don't know how long any alternative would take?
Jonathan Leffler
@Hailwood: your VARCHAR(256) for the 'to_number' column is a bit alarming. What is the longest to_number actually? I'd be surprised if it was more than about 20 characters (perhaps with some punctuation). You're forcing the system to allow for something 25 times that size; which makes for a lot of work. Do you have an index on sent_txts.to_number? What about on action_6_members.mobile? If you have neither, then it is no surprise that your system locks up. You need at least one of those two indexes.
Jonathan Leffler
I have added what I know to the question, If you require anything else let me know.
Hailwood
The table was not actually created with this query in mind, this query is also a once off (once it is done it will never be done again).
Hailwood
What index would you suggest placing on those columns seeing as how both columns can have multiple entries that are identical.
Hailwood
With regards to the 256 I agree, however I did not create the table :/
Hailwood
Well, in the DBMS I'm most familiar with, it would be `CREATE INDEX x_mobile ON action_6_members(mobile);` to create a B-Tree index (allowing duplicates) on it. What that translates to in MySQL is more debatable. The DBMS I'm most familiar with would most probably auto-index one of the tables - the smaller, presumably action_6_members - for the purposes of answering the query, assuming that the statistics were sufficiently up to date (which, these days, they would be because there's an auotmatic update statistics process).
Jonathan Leffler
It looks like you can do almost exactly the same in MySQL; you might want to take advantage of the option to index part of a long column, though: `CREATE INDEX x_mobile USING BTREE ON action_6_members (mobile(25));` to index only the first 25 characters of the column. You might be able to trim that some more. If you also have an index on sent_texts.to_number and if the optimizer supports merge joins, then it could use the two indexes to do the merge join, reading them in sequence. But this is the sort of thing that depends on the optimizer, and I'm not an expert on the MySQL optimizer.
Jonathan Leffler
OMG Ponies
@Hailwood: you might also benefit from a BTREE index on '`msg_sent_datetime`'; that could allow the optimizer to pick up the records in the date range faster if there are a lot of records outside the date range. You'd use this instead of an index on `to_number`.
Jonathan Leffler
Thankyou, I added an index on all three columns and it seems to be working.
Hailwood