tags:

views:

694

answers:

6

Hi I am working on a query that is using the below schema to find out how often users are doing searches by company (the app basically allowed you to do a search by company)

SEARCH_LOG
----------
date_of_search (DATETIME)
company_id (INT)


COMPANIES
---------
id (INT)
company_name (VARCHAR)

(there are more columns but these are the relevant ones)

So I am running the following query:

SELECT company_name,COUNT(*) FROM companies LEFT OUTER JOIN search_log ON search_log.company_id=companies.id GROUP BY companies.id

This is great as it returns each company and the number of search performed, however I want to express these numbers as a percentage. My knee jerk reaction was to just run the following query seperately:

SELECT COUNT(*) FROM search_log

Grab that result and just do the division on the application side, however this seems really inefficient and I'd like to do it all in one query if possible (preferably without the use of a subquery) but have no idea how to get that information.

Any help or guidance would be appreciated.

EDIT: Maybe I wasn't totally clear about what I'm after. Instead of getting results like:

COMPANY_NAME | COUNT(*)
-----------------------
CompanyA     | 1
CompanyB     | 3

I'd rather see:

COMPANY_NAME | COUNT(*)
-----------------------
CompanyA     | 25%
CompanyB     | 75%

Obviously the formatting isn't super important as 25,25%,0.25 are all usable.

+2  A: 

I've done things like this with views and custom functions (I don't know if functions are available in mysql though). To be honest with you, your best bet is to create a small data warehouse. Reporting things like that will be much faster this way. You'll also be able to find new ways of reporting your data.

The downside is that usually, you can't get a "live" report. You generally aggregate all of your data at night. An upside, you can look at trends over time this way.

Basically, during a low traffic hour, you'll take a snapshot of your data. Then, you'll mutate them and stick them in a dimensional model. After that, all of your reporting is very easy! :)

Other than that, you're best bet is to use a scalar function or subqueries.

D. Patrick
+1  A: 

This question has been asked before in a slightly different form, and I don't see any solutions that would avoid a second query - if you want to do it all in the database.

If you do it in your app, you don't actually need to do a second query. Just iterate over all the results from the first query and add up the totals for each group. That should give you the "grand total" without having to hit the db with a second query.

It would probably force you to iterate the results twice though - once to get the total, a second time to calculate the percentages. But still likely faster than doing a second query.

Eric Petroelje
+1  A: 

How about this instead of

SELECT company_name,COUNT(*) FROM companies LEFT OUTER JOIN search_log ON search_log.company_id=companies.id GROUP BY companies.id

try

SELECT company_name,(COUNT(*)/(select count(*) from search_log) * 100) as percent FROM companies LEFT OUTER JOIN search_log ON search_log.company_id=companies.id GROUP BY companies.id
Lee Irving
Yes that works but like I said I don't want to use a subquery if possible
Andrew G. Johnson
missed that part of your question
Lee Irving
No problem, still a decent answer and what I may end up having to do
Andrew G. Johnson
Will the subquery not be cached by mysql?
Lee Irving
A: 

I've always done what your knee-jerk reaction was.

First, I'm more comfortable with writing math on the application side. And second, I don't know how much I trust MySQL's math.

I don't think doing a count(*) is inefficient at all.

T Pops
It's not that I think COUNT(*) is inefficient it's that I'll be doing it once per row on top of my query if I use a subquery
Andrew G. Johnson
+3  A: 

Although not a single query, following solution will do it in SQL using variable

select @total:=count(*) from search_log;
select company_id,count( * ) , count( * )/@total as percentage from search_log group by company_id;

Tahir Akhtar
Upvoted for a valid idea but because I'm using PHP I don't think it's possible to implement this idea.
Andrew G. Johnson
I think it will run on PHP as well. It's just plain sql. Just run both queries on the same connection/statement.
Tahir Akhtar
A: 

This will run the subquery just once and give you the proportions you're looking for.

SELECT company_name, COUNT(*)/t.total FROM companies 
LEFT OUTER JOIN search_log ON search_log.company_id=companies.id
JOIN (SELECT count(*) as total FROM companies) AS t
GROUP BY companies.id
Randy