views:

159

answers:

1

Hello,

I was looking on different questions on this issue, but couldn't find an answer for my problem.

This is my query:

SELECT SUM( lead_value ) AS lead_value_sum, count( DISTINCT phone ) AS SUM, referer
FROM leads t1
INNER JOIN leads_people_details t2 ON t1.lead_id = t2.lead_id
INNER JOIN user_to_leads t3 ON t1.lead_id = t3.lead_id
WHERE lead_date
BETWEEN 20100716000000
AND 20100716235959
AND t1.site_id =8
GROUP BY t1.referer

I am trying to sum up the lead_value only of unique phone numbers. The count (Distinct phone) actually works and gives me the number of unique phones for each referer, but I can't seem to understand how should I SUM the lead_value for unique phone numbers at each referer.

Would appreciate any help you can give me, Eden

Edit: Table Structures

CREATE TABLE user_to_leads
 (
user_idINT(10) NOT NULL,
lead_idINT(10) NOT NULL,
site_idINT(10) NOT NULL,
lead_value INT(10) NOT NULL
 )

CREATE TABLE leads
 (
lead_id INT(100) NOT NULL auto_increment ,
site_id INT(10) NOT NULL ,
lead_date TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ,
vaild_date TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00',
referer VARCHAR(255) NOT NULL,
KEYWORD VARCHAR(255) NOT NULL,
upsaleINT(11) NOT NULL DEFAULT '0' ,
vaild INT(2) NOT NULL,
PRIMARY KEY (lead_id),
KEY lead_date (lead_date)
 )


CREATE TABLE leads_people_details
 (
lead_id INT(100) NOT NULL auto_increment ,
fullnameVARCHAR(255) NOT NULL,
phone VARCHAR(12) NOT NULL ,
email VARCHAR(255) NOT NULL,
homeVARCHAR(255) NOT NULL,
browser VARCHAR(255) NOT NULL,
browser_version VARCHAR(100) NOT NULL,
resolutionVARCHAR(255) NOT NULL,
IPVARCHAR(255) NOT NULL,
statusVARCHAR(255) NOT NULL DEFAULT '0',
COMMENT text NOT NULL,
PRIMARY KEY (lead_id)
 )
+1  A: 

You say

For a particular referer,phone, the lead_value will always be the same

Based on the limited information you have given I think this should return the right answer. If you update your question with the requested information it will probably be possible to improve upon it though.

SELECT SUM(lead_value ) AS lead_value_sum, count(phone ) AS phone_count, referer
FROM
(
SELECT DISTINCT lead_value, phone, referer
FROM leads t1
INNER JOIN leads_people_details t2 ON t1.lead_id = t2.lead_id
INNER JOIN user_to_leads t3 ON t1.lead_id = t3.lead_id
WHERE lead_date
BETWEEN 20100716000000
AND 20100716235959
AND t1.site_id =8
) derived
GROUP BY referer

Upated after table structure posted

I don't really understand why have both leads_people_details and leads got a primary key and auto_increment column of lead_id that you are joining on? That would imply a 1-1 relationship between leads and leads_people_details? If so one of them probably shouldn't be an auto_increment to avoid the possibility of the ids getting out of synch without you realising.

Also there is no Primary Key on the user_to_leads table. Should there one on user_id, lead_id, site_id? Additionally you are not currently filtering by siteid on that table. Is that intentional? If not if you do that does that stop the duplicate records from coming back? If it doesn't then can you describe the significance of user_id in that table? You earlier said that For a particular referer,phone, the lead_value will always be the same can it differ by user_id? If so which should be used? If not why is user_id in that table?

A provisional query that might be closer is here but there are still the unresolved queries above.

SELECT SUM(lead_value ) AS lead_value_sum, count(phone ) AS phone_count, referer
FROM leads t1
INNER JOIN leads_people_details t2 ON t1.lead_id = t2.lead_id
INNER JOIN user_to_leads t3 ON t1.lead_id = t3.lead_id  
               and t1.site_id = t3.site_id
WHERE lead_date
BETWEEN 20100716000000
AND 20100716235959
AND t1.site_id =8
Martin Smith
Hey Martin Smith, it seems your solution worked well, at least for this specific stats, I will be checking it for more of course.Can you please explain me what exactly have you done? Is the "FROM" choosing from a table you are creating right inside the query?And what is derived all about? :)Thanks so much for your help,Eden
Eden
@Eden - Really I just need to know what columns go in each of the tables. At the moment it brings back duplicates then gets rid of them with distinct. I'm hoping that I should be able to get rid of this step and make things more efficient. The derived bit refers to a derived table. It is like an inline view.
Martin Smith
You are right - leads_people_details doesn't have to be auto_increment, every time I insert the people details I use the lead_id that was just added to leads, so they are always the same. user_to_leads - I have to have a primary key? Which one should I use then?each lead_id is attached to a user_id (sometimes even more than 1), telling me to which user to lead is "attached" to. each user_id has a different lead_value, so for your next questions - same referrer and phone will be always the same lead_value, but it WILL differ by different user_ids, which each gives a different lead_value.
Eden
continuing my message :I don't understand how your new query solves 2 rows for same site (example - 8), which in the people_details table it has the same phone - where does it filter the double phone if it's on the same site?
Eden
@Eden - But you're not joining `userid` onto anything. Which means in that case `referer,phone` *will* have different `lead_value` s in the query in the case you have just described. Additionally can the value of `site_id` in `user_to_leads` ever be any different than the corresponding `site_id` in `leads`?
Martin Smith
Hey Martin,The site_id attached to each row in leads (linked via lead_id) is always the same site_id used to link user_to_leads to leads, also via lead_id.Can you please explain me how the new last query you wrote solves the problem of rows having the same phone, needing to counting only one lead_sum and only summing one lead_value and not all together?
Eden
@Edeb - It probably doesn't. The key phrase was "A provisional query that might be closer". I'm not really clear about your data model so I guess just use the first one if it works for you!
Martin Smith
Hey Martin, didn't the tables structure I attached before is enough to learn the data model? The first soultion seems to work, but I think it slows down the query too much because it uses a sub query, I hoped I can find a more elegant way to solve this one..
Eden