ansaurus

Question

Answer 1

+1 A:

Normally, a HAVING clause relates an aggregate to a value or another aggregate. What is in the HAVING clause of the question should be in the main WHERE clause:

SELECT COUNT(DISTINCT to_number) AS errors
  FROM sent_txts AS s
 WHERE msg_link_id = 0
   AND msg_sent_datetime BETWEEN '2010-08-09 07:00:00'
                             AND '2010-09-07 11:59:59'
   AND to_number IN (SELECT mobile FROM action_6_members AS a
                      WHERE a.mobile = s.to_number)

But you should probably make that into a regular join:

SELECT COUNT(DISTINCT s.to_number) AS errors
  FROM sent_txts AS s
  JOIN action_6_members AS a ON a.mobile = s.to_number
 WHERE s.msg_link_id = 0
   AND s.msg_sent_datetime BETWEEN '2010-08-09 07:00:00'
                               AND '2010-09-07 11:59:59';

Jonathan Leffler 2010-10-13 00:56:25

+1: You beat me; I was working on an example using EXISTS. The JOIN to `action_6_members` risks inflating records if there's more than one child associated to a parent record.

OMG Ponies 2010-10-13 00:59:19

@OMG: but the DISTINCT undoes any of that damage, does it not? Also, I think the 'Action 6' people are on a special (mobile phone) rate plan - and this is counting the number of calls sent between the two dates to people on the Action 6 plan - so there will only be one number (at most) in the Action 6 table for any given 'to_number'.

Jonathan Leffler 2010-10-13 01:00:08

hmm, dont really like joins but ok :) Just incase, this query should basically select all unique mobile(to_number) from the sent_txts table(based on the where) but only count it if the number exists in the action_6_members, can you see a better way?

Hailwood 2010-10-13 01:01:23

hmm, trying the query above locks up my sql client. (could it be that it is a 30k row table joining a 10k row table?)

Hailwood 2010-10-13 01:04:22

@Hailwood: There's a very decent chance that the join will be more efficient than the IN sub-query (unless the optimizer is smart enough to convert the IN into a join of its own accord), unless the number of people in the Action 6 table is very small (say, under 100). And joins are usually effective - they are, after all, the bread and butter of an optimizer. If your DBMS can't manage joins efficiently, get a better DBMS (meaning one that *can* manage joins efficiently).

Jonathan Leffler 2010-10-13 01:05:21

@jonothan not quite, a mobile may be in action_6 multiple times.

Hailwood 2010-10-13 01:06:10

@Hailwood: what indexes do you have on sent_txts and action_6_members? What does the query plan say?

Jonathan Leffler 2010-10-13 01:06:25

@Jonothan yeah, I kinda learnt to hate joins when I was working with joining four tables with some having 3Million+ rows

Hailwood 2010-10-13 01:08:11

@Hailwood: please note that schema and cardinality information is helpful. Both the gross counts and the relationships such as 'a single mobile number may appear several times in action_6_members'. Since you were asking how to do it, is it safe to assume you didn't have a working query and you don't know how long any alternative would take?

Jonathan Leffler 2010-10-13 01:08:46

@Hailwood: your VARCHAR(256) for the 'to_number' column is a bit alarming. What is the longest to_number actually? I'd be surprised if it was more than about 20 characters (perhaps with some punctuation). You're forcing the system to allow for something 25 times that size; which makes for a lot of work. Do you have an index on sent_txts.to_number? What about on action_6_members.mobile? If you have neither, then it is no surprise that your system locks up. You need at least one of those two indexes.

Jonathan Leffler 2010-10-13 01:11:49

I have added what I know to the question, If you require anything else let me know.

Hailwood 2010-10-13 01:12:13

The table was not actually created with this query in mind, this query is also a once off (once it is done it will never be done again).

Hailwood 2010-10-13 01:13:15

What index would you suggest placing on those columns seeing as how both columns can have multiple entries that are identical.

Hailwood 2010-10-13 01:13:43

With regards to the 256 I agree, however I did not create the table :/

Hailwood 2010-10-13 01:14:44

Well, in the DBMS I'm most familiar with, it would be `CREATE INDEX x_mobile ON action_6_members(mobile);` to create a B-Tree index (allowing duplicates) on it. What that translates to in MySQL is more debatable. The DBMS I'm most familiar with would most probably auto-index one of the tables - the smaller, presumably action_6_members - for the purposes of answering the query, assuming that the statistics were sufficiently up to date (which, these days, they would be because there's an auotmatic update statistics process).

Jonathan Leffler 2010-10-13 01:16:51

It looks like you can do almost exactly the same in MySQL; you might want to take advantage of the option to index part of a long column, though: `CREATE INDEX x_mobile USING BTREE ON action_6_members (mobile(25));` to index only the first 25 characters of the column. You might be able to trim that some more. If you also have an index on sent_texts.to_number and if the optimizer supports merge joins, then it could use the two indexes to do the merge join, reading them in sequence. But this is the sort of thing that depends on the optimizer, and I'm not an expert on the MySQL optimizer.

Jonathan Leffler 2010-10-13 01:21:27

OMG Ponies 2010-10-13 01:22:15

@Hailwood: you might also benefit from a BTREE index on '`msg_sent_datetime`'; that could allow the optimizer to pick up the records in the date range faster if there are a lot of records outside the date range. You'd use this instead of an index on `to_number`.

Jonathan Leffler 2010-10-13 01:26:23

Thankyou, I added an index on all three columns and it seems to be working.

Hailwood 2010-10-13 01:45:58

ansaurus

tags:

views:

answers:

mysql unknown column...

related questions