ansaurus

Question

Answer 1

+3 A:

this query is about as good as it gets if you have an index on Customer.Email and another on CommunicationInfo.Email

Select
    c.Email, count(*)
    from Customer c
        left join CommunicationInfo  ci on c.Email1 = ci.Email
        left join CommunicationInfo ci2 on c.Email2 = ci2.Email
    Group by c.Email

KM 2010-04-15 17:18:16

Key is Index. Make sure that email column is indexed properly.

Raja 2010-04-15 17:26:00

If `ci.Email` is a `PRIMARY KEY`, this query is a synonym for `SELECT email, COUNT(*) FROM customer`. If it's not, this query is a messy cartesian join.

Quassnoi 2010-04-15 17:33:20

Answer 2

+1 A:

Using the OR condition robs the optimizer of opportunity to use HASH JOIN or MERGE JOIN.

Use this:

SELECT  ci.Email, SUM(cnt)
FROM    (
        SELECT  ci.Email, COUNT(c.Email) AS cnt
        FROM    CommunicationInfo ci
        LEFT JOIN
                Customer c
        ON      c.Email1 = ci.Email
        GROUP BY
                ci.Email
        UNION ALL
        SELECT  ci.Email, COUNT(c.Email) AS cnt
        FROM    CommunicationInfo ci
        LEFT JOIN
                Customer c
        ON      c.Email2 = ci.Email
        GROUP BY
                ci.Email
        ) q2
GROUP BY
        ci.Email

or this:

SELECT  ci.Email, COUNT(*)
FROM    CommunicationInfo ci
LEFT JOIN
        (
        SELECT  Email1 AS email
        FROM    Customer c
        UNION ALL
        SELECT  Email2
        FROM    Customer
        ) q
ON      q.Email = ci.Email
GROUP BY
        ci.Email

Make sure that you have indexes on Customer(Email) and Customer(Email2)

The first query will be more efficient if your emails are mostly not filled, the second one — if most emails are filled.

Quassnoi 2010-04-15 17:22:41

I have a feeling that since there are millions of records in the Customer table, hitting IO twice against wouldn't necessarily make it run faster, but the thought crossed my mind as well.

Jeremy 2010-04-15 17:38:57

@Jeremy: as long as `Email` and `Email2` are indexed, this will just get all keys from two indexes in a sequential fashion. If all or almost all fields are filled and have a match, the same thing would need to be done, only in a less efficient nested loops (random read) fashion.

Quassnoi 2010-04-15 17:47:42

Ah yes, you mention that indexes need to be on the columns. My bad.

Jeremy 2010-04-15 18:12:58

Answer 3

+1 A:

You mention:

And What I want in here is; how many times the email address in CommunicationInfo table repeats in Customers table. What could be the the most performer query.

To me, that sounds like you could easily use an INNER JOIN - this would most likely be a lot faster, since it will limit the search scope to just those customers who really do have an e-mail - anyone who doesn't have an e-mail at all (and thus a count(*) = 0) will not even be looked at - that might make a big difference even just in the number of rows SQL Server has to count and group.

So try this:

SELECT 
   ci.Email, COUNT(*) 
FROM 
   dbo.Customer c 
INNER JOIN dbo.CommunicationInfo ci 
   ON c.Email1 = ci.Email OR c.Email2 = ci.Email  
GROUP BY
   ci.Email

How does that perform in your case??

marc_s 2010-04-15 17:28:36

It really takes so much time to execute.

yapiskan 2010-04-15 17:35:41

Answer 4

A:

Depending on your environment there may not be much you can do to optimize this.

A couple of questions:

How many records in CommunicationInfo?
How often do you really need to run this query? Is it a one time analysis, or are multiple people going to be running this every 10 minutes?
Are the fields indexed? I'll make a guess that neither Email1 nor Email2 field is indexed. However, I wouldn't suggest adding an index without taking the balance of the whole system into consideration.
Why are you using a left join? Do you really need EVERYTHING from the Customer table? You're counting, so no harm in doing an INNER JOIN.

Suggestions:

Run the query through the Query Optimization wizard to see if there is anything SQL Server would recommend.
An extreme suggestion would be to dump the Email1 and Email2 columns into a temp table and join to that. I've seen queries run slowly because of a large amount of stress on a particular table, so sometimes copying the records into a temp table is faster, but this technique is very dependent on how much memory there is, how fast IO is, and the amount of stress on a particular table.

Jeremy 2010-04-15 17:37:27

ansaurus

tags:

views:

answers:

Performance problem on a query.

related questions