views:

56

answers:

3

I have 2 tables, that look like:

CustomerInfo(CustomterID, CustomerName) CustomerReviews(ReviewID, CustomerID, Review, Score)

I want to search reviews for a string and return CustomerInfo.CustomerID and CustomerInfo.CustomerName. However, I only want to show distinct CustomerID and CustomerName along with just one of their CustomerReviews.Reviews and CustomerReviews.Score. I also want to order by the CustomerReviews.Score.

I can't figure out how to do this, since a customer can leave multiple reviews, but I only want a list of customers with their highest scored review.

Any ideas?

A: 

I think this should do it

select ci.CustomterID, ci.CustomerName, cr.Review, cr.Score
from CustomerInfo ci inner join 
(select top 1*
from CustomerReviews
where Review like '%search%'
order by Score desc) cr on ci.CustomterID = cr.CustomterID 
order by cr.Score
alejandrobog
Is there a faster query? This seems like it would be very slow with a lot of records since I'll be performing an individual search for each customer record.Thanks!
Erin Taylor
I dont know any other way, have check the query's execution plan?
alejandrobog
+2  A: 

This is the greatest-n-per-group problem that has come up dozens of times on Stack Overflow.

Here's a solution that works with a window function:

WITH CustomerCTE (
  SELECT i.*, r.*, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY Score DESC) AS RN
  FROM CustomerInfo i
  INNER JOIN CustomerReviews r ON i.CustomerID = r.CustomerID 
  WHERE CONTAINS(r.Review, '"search"')
)
SELECT * FROM CustomerCTE WHERE RN = 1
ORDER BY Score;

And here's a solution that works more broadly with RDBMS brands that don't support window functions:

SELECT i.*, r1.*
FROM CustomerInfo i
INNER JOIN CustomerReviews r1 ON i.CustomerID = r1.CustomerID 
  AND CONTAINS(r1.Review, '"search"')
LEFT OUTER JOIN CustomerReviews r2 ON i.CustomerID = r2.CustomerID 
  AND CONTAINS(r1.Review, '"search"')
  AND (r1.Score < r2.Score OR r1.Score = r2.Score AND r1.ReviewID < r2.ReviewID)
WHERE r2.CustomerID IS NULL
ORDER BY Score;

I'm showing the CONTAINS() function because you should be using the fulltext search facility in SQL Server, not using LIKE with wildcards.

Bill Karwin
Thanks. I do have full text search enabled.
Erin Taylor
Thank you so much! This makes a lot more sense, and seems like it works well performance wise. I do have full text enabled, and the "score" I was referring to is actually the "rank" from the full text search. I wasn't sure if many ppl are familiar with full text. I think I can replace "score" with "rank" and join the CONTAINSTABLE.
Erin Taylor
Awesome! Glad to help.
Bill Karwin
A: 

I voted for Bill Karwin's answer, but I thought I'd throw out another option.

It uses a correlated subquery, which can often incur performance problems with large data sets, so use with caution. I think the only upside is that the query is easier to immediately understand.

select *
from [CustomerReviews] r
where [ReviewID] =
(
    select top 1 [ReviewID]
    from [CustomerReviews] rInner
    where rInner.CustomerID = r.CustomerID
    order by Score desc
)
order by Score desc

I didn't add the string search filter, but that can be easily added.

Andy White
Yeah but I try to avoid correlated subqueries if there's another solution that's not correlated. The other solution is *usually* more efficient (there's an exception to every rule, though).
Bill Karwin