views:

776

answers:

2

I was about to ask the MySql list this and remembered about SO.

Running MySql 5.0.85, I need to be as efficient as possible about a few queries. If I could get a little review, I would appreciate it.

I collect data in the millions, and need the top 50 grouped by one field, with a percentage of how much those top 50 occupy.

Here is what I have come up with... 1) I have a feeling I can be more efficient, perhaps with a join 2) How can I get the percentage to be of precision in the hundredths, so * 100.00 ie: .07 becomes 7.00, getting SQL errors if I (percentage * 100)

SELECT user_agent_parsed, user_agent_original, COUNT( user_agent_parsed ) AS thecount, 
    COUNT( * ) / ( SELECT COUNT( * ) FROM agents ) AS percentage
FROM agents
GROUP BY user_agent_parsed
ORDER BY thecount DESC LIMIT 50;

Second issue, once a day I need to archive the result of the above. Any suggestions on how to best to do that? I can schedule with cron, or in my case, launchd, unless someone has a better suggestion.

Would you think that a simple 'SELECT (the above) INTO foo' would suffice?

A: 

I quite don't understand your question fully so I'll just answer first your question on how to get the percentage. And I'll use your present query.

 SELECT user_agent_parsed, user_agent_original, COUNT( user_agent_parsed ) AS thecount, 
    ((COUNT( * ) / ( SELECT COUNT( * ) FROM agents)) * 100 ) AS percentage
FROM agents
GROUP BY user_agent_parsed
ORDER BY thecount DESC LIMIT 50;

In order for me to help you further, I think I need you to elaborate it further ;-)

junmats
Misplaced paren, Thanks!. The second issue is that I will take the result of the above query, and want to save that results state in time. I am storing hits to a user agent log, so I can find that Safari is 100 uses a day, IE is 65 uses a day, etc (simplified). This of course changes from day to day and I want to chart the growth/decline over a year. I need to store the result of the above query, for long term stats. I am considering selecting the result into a new table, unless that is a bad idea and there is a more elegant one,
+1  A: 

First Issue:

select count(*) from agents into @AgentCount;

SELECT user_agent_parsed, user_agent_original, COUNT( user_agent_parsed ) AS thecount, 
    COUNT( * ) / ( @AgentCount) AS percentage
FROM agents
GROUP BY user_agent_parsed
ORDER BY thecount DESC LIMIT 50;
lexu
How is that a higher performer? Still two queries, you may even slow it down as you are now literally storing a variable. milliseconds sure, but can you elaborate?
Your nested query is potentially run once per grouped element. Mine runs once. Granted, this might be caught by the optimizer..
lexu
Ah, thanks. Ill run explain and see.
No need for the dual selects, the MySql optimizer, at least in 5.x takes care of it.
good to know .. guess I was being too defensive here!
lexu