tags:

views:

188

answers:

3

I hope someone can help me out. I have a table that logs our import jobs. I need a query that will produce a matrix with the names of tables on the vertical axis, the import dates on the horizontal axis, and the total number of records imported for that table on that date in the matrix cell. I don't care if we have to create a temporary table, but the whole thing must be done in MySQL.

Below is a simplified sample of our event log table. Not only does it have many more fieds, but we import many more tables. Therefore, the solution should account for querying the table names. You will notice that data can be imported into a table more than once per day, as in records 5 and 6.

id  table_name  import_date          num_recs 
----+-----------+--------------------+------- 
0   customer    2010-06-20 00:00:00  10        
1   order       2010-06-20 00:00:00  15        
2   customer    2010-06-21 00:00:00  5         
3   order       2010-06-21 00:00:00  6         
4   customer    2010-06-22 00:00:00  1         
5   order       2010-06-22 00:00:00  6         
6   order       2010-06-22 00:00:00  1         

We are looking for a result something like this. It does not have to be exact

table_name  06-20 06-21 06-22
------------+-----+-----+------
customer    |  10 |   5 |   1
order       |  15 |   6 |   7
+3  A: 

What about output of the form:

table_name   date    imports
------------+-------+--------
customer    | 06-20 |   10
customer    | 06-21 |   5
order       | 06-20 |   15
order       | 06-21 |   6

This way you can do it with a simple GROUP BY:

SELECT table_name, DATE(import_date) AS date, SUM(*) AS imports
FROM yourTable
GROUP BY table_name, date;

Otherwise, your query is going go be really nasty.

Ben S
Well, we do imports every night. My manager wants a MySQL query report that can be run any time to display said import history for the last 30 days.
And, yes, he wants dynamic dates.
Its quite possible to do clever stuff to turn rows into columns (I'll avoid doing a critique of the answers trying to attempt this so far) but it's the WRONG WAY TO SOLVE PROBLEM. Go with Ben S's answer. If your job depends on you providing it in the original layout and your boss kbows **nothing** about relational databases, then run the query from MSExcel or OpenOffice and generate a pivot table. And start looking for another job, 'cos pretty soon he's going to ask you to organise the first manned mission to Mars using lots of rubber bands.
symcbean
Other databases (e.g. PostgreSQL, MS-SQL) have built-in functionality to do pivot tables/cross tabs/whatever you want to call them. Asking users to do pivot tables (or any thing that involves more effort on their part) is sometimes not the way to do it, and we as developers should be able to work around mysql's limitations. Using SQL to generate SQL is a standard tool, not "lots of rubber bands".
runrig
@symcbean: what's there's to critique about a standard pivot query? Only SQL Server 2005+ and Oracle 11g+ have PIVOT/UNPIVOT syntax. I don't see how using this query as a first stage for alteration in Excel/Access is better than doing it entirely in a SQL query...
OMG Ponies
donbriggs, Help me out here, i'm confused. How is this answer not correct? It shows the exact data you wanted. It can easily be filtered for other dates using WHERE date BETWEEN x AND y. It's not in the matrix form that you were looking for, but it shows the same info. Why all the extra pain to make the data fit into a different format?
Hamy
@Hamy: What part of the data running by date in columns, not rows, do you not comprehend?
OMG Ponies
@OMG Please stop being a jerk to everyone here, you're not the only smart one on the nets. It's clear from my question that I understand what the OP is asking for, and I am attempting to get him/her to reconsider. IMO an elegant solution has been offered that performs the same functionally. Only the OP knows if it is worth their time to try to achieve the exact solution they asked for, I am merely recommending that it may not be.
Hamy
@Hamy: How is that you believe this is correct when the OP clearly provides an example of expected output that this answer does not provide. So I say again - What part of the data running by date in columns, not rows, do you not comprehend?
OMG Ponies
@OMG I understand that clearly. I am suggesting that the OP does not in fact want what s/he is asking for, and perhaps they should accept this solution as good enough
Hamy
@Hamy: Riiight... You, through the power of telepathy and the internet know better for the user who provided clear examples... Nevermind that your account shows **no** votes for anything remotely SQL related.
OMG Ponies
ok, ok - you can "win" this internet battle. I should have known better anyways. enjoy.
Hamy
This answer gives the data you need. If you need to display it in another form, just do that in your presentation code. Turning a row into columns is not that big a deal. Or is there something else that absolutly require the resultset from mysql to be in a particular form ?
nos
Guys, thanks for all the help. But when I to the person who tasked me with this how much more sense it made to do the processing in the code, they understood.
+2  A: 

MySQL can not do pivot queries, but you can do it in two queries, using the result of the first query as the SQL for the next query:

SELECT 'SELECT table_name'
UNION
SELECT CONCAT(', SUM(IF(import_date = "',import_date,'", num_recs,0)) AS "',DATE_FORMAT(import_date, "%m-%d"),'"')
FROM event_log
GROUP BY import_date
UNION
SELECT 'FROM event_log GROUP BY table_name'

Then execute the output of that query to get your final results, e.g. for your example you would get:

SELECT table_name                                                           
, SUM(IF(import_date = "2010-06-20", num_recs,0)) AS "06-20"
, SUM(IF(import_date = "2010-06-21", num_recs,0)) AS "06-21"
, SUM(IF(import_date = "2010-06-22", num_recs,0)) AS "06-22"
FROM event_log GROUP BY table_name

You can either write a stored procedure to concatenate, prepare, and then execute the results of the first query, OR, if this is all run from a shell script, you can capture the results of the first query, then feed the results back into mysql.

runrig
See my query for how to do it in one statement.
OMG Ponies
And that's not dynamic SQL, only the queries necessary.
OMG Ponies
I didn't say it was dynamic SQL. Just a dynamic solution :-) Since the first query returns more than one row, you can't use mysql's prepared statements directly, but you could write a stored procedure to concatenate, prepare and then execute the output of the first query. Or you could just capture the output of the first query from a shell script, and then feed it back into mysql.
runrig
We're loosing out to an answer that only provides half of the actual answer. But yes, realistically the dynamic SQL would have to be inside a stored procedure.
OMG Ponies
+1: To keep you above Hamy's fundamentally flawed wiki answer.
OMG Ponies
A: 

I think Ben S is on the right track. I wanted to offer what I could here in case it helps anyone, who knows. Original source

Here is a method to take two arbitrary dates and split them apart into blocks of time, and then performs some aggregation function on other data in each block. In your case, that block should probably be a single day, the start date would likely be 30 days prior to the current day, and the end date would likely be the current day. Each block can be returned with some aggregate metric of interest. In your case, this will likely be the SUM('imports')

SELECT t1.table_name AS table_name, t1.imports AS imports FROM (SELECT SUM(`imports`) AS imports, CEIL( (UNIX_TIMESTAMP('<now>') - UNIX_TIMESTAMP(`import_date`))/ (<one day in ?seconds, i think?>) ) AS RANGE FROM `<your table>` WHERE `import_date` BETWEEN '<now minus 30 days>' AND '<now>' GROUP BY RANGE ORDER BY RANGE DESC) AS t1;

This might not help at all, but if it does then goody. It's easily modified to return the starting day for each range as a date column. To be clear, this does the exact same thing that Ben S's solution offers, but it will work if all of your dates are not 00:00:00 whereas that would cause his GROUP BY on the date column to fail

To see what the return would look like, see Ben S's answer and mentally remove the date column. As I said however, that column could easily be added back into this query. FWIW, I have used this method on tables with upwards of 4 million rows and it still runs in < 1 second, which was good enough for my purposes.

Hamy

Hamy
The point of answers like runrig's and mine is that you don't have to do that work outside of SQL, "mentally removing the date column".
OMG Ponies
You don't think I don't know you downvoted me? Can you get any more infantile?
OMG Ponies
The solution, as some of you mentioned, is multiple queries, and some processing in PHP.