views:

177

answers:

2

This should be and easy one for the LINQ gurus out there.

I'm doing a complex Query using UNIONS and CONTAINSTABLE in my database to return ranked results to my application.

I'm getting duplicates in my returned data. This is expected. I'm using CONTAINSTABLE and CONTAINS to get all the results I need. CONTAINSTABLE is ranked by SQL and CONTAINS (which is run only on the Keywords field ) is hard-code-ranked by me. ( Sorry if that doesn't make sense )

Anyway, because the tuples aren't identical ( their rank is different ) a duplicate is returned.

I figure the best way to deal with this is use LINQ.

I know I'll be using the Distinct() extension method, but do I have to implement the IEqualityComparer interface? I'm a little fuzzy on how to do this.

For argument's sake, say my resultset is structured like this class:

class Content {
  ContentID int //KEY
  Rank int 
  Description String 
}

If I have a List<Content> how would I write the Distinct() method to exclude Rank? Ideally I'd like to keep the Content's highest Rank. SO, if one Content's RAnk is 112 and the other is 76. I'd like to keep the 112 rank.

Hopefully I've given enough information.

EDIT

Here's a sample of the SQL for anyone who has an Idea on how to do this in SQL:

UNION ALL  
SELECT
    p.ProductID     AS ContentID
  , p.ProductName   AS Title
  , K.RANK          AS Rank
  , 'Product'       AS ContentType
  , p.AddedDateTime

FROM Products AS p
 INNER JOIN CONTAINSTABLE(NaturalFactorsPIM.dbo.Products,*, @SearchPred, @TopN) AS K ON K.[KEY] = P.ProductID
WHERE @IncludeProducts = 1 
  AND p.ProductStatus = 1 

UNION ALL  
SELECT
   p.ProductID    AS ContentID
 , p.ProductName  AS Title
 , 80             AS Rank
 , 'Product'      AS ContentType
 , p.AddedDateTime

FROM Products AS p
WHERE FREETEXT( p.ProductKeywords, @SearchPred)
  AND @IncludeProducts = 1 
  AND p.ProductStatus = 1 

ORDER BY Rank DESC
A: 

Group By ContentID, then OrderBy Rank and select the First item from each resulting group:

var result = myContent.GroupBy( c => c.ContentID, 
                                g => g.OrderBy( x => x.Rank ).First() );

Using query comprehension syntax:

var result = from c in myContent
             group c by c.ContentID into g
             select g.First() order by g.Rank;
LBushkin
Thanks! I actually tried the first solution but the second parameter of the GroupBy doesn't like the `OrderBy( /*...*/ )` I think you need parentheses? var q = (from c in results group c by c.ContentID into g select g.First()).OrderBy(x => x.Rank);
Atømix
+1  A: 

This should do what you want:

Var List<Content> AllTheData = GetTheDataForThisExample()
Var DistinctDataWithMaxRank = From C in AllTheData
    Group C by C.ContentID, C.Description
         Into G
    Select New { ContentID = g.ContentID
                 Rank = (From rmax in g select rmax.rank).Max()
                 Description = g.Description
               };

But, if I understand you correctly, this would be very easy in SQL:

Select ContentID, Max(Rank) As Rank, Description
From (Select * From ComplicatedFullTextSearchQuery()) As Q
Group By ContentID, Description

I arranged this sql like your main query was a UDF; whatever your original query is, put it in the () on the second line. If performance matters, have a procedure/function put your original query into a temporary table, an then just do:

Select ContentID, Max(Rank) As Rank, Description
From TempTable    Group By ContentID, Description
Patrick Karcher
Great! You're right... the SQL Solution is simpler! I tried it as a Subquery before but was doing something wrong with the syntax ( I think I forgot the `AS Q`.) It seems to work! Eureka!
Atømix
BTW, right now it is in a procedure, so I may optimize it with a TempTable. Will this provide a big boost in performance?
Atømix
Whether the performance boost is big depends on your data, especially how big the result set is. And any optimization depends on how loaded up the db is, how often is this run, etc. If this is going to be run quite a bit, like whenever users hit a particular common web page, then I'd fill a temp table if the result set is a few thousand records. In cases like this, testing to see whether you need to optimize will take more time than just optimizing. Since you already have a procedure getting your results, do the temp-table intermediate.
Patrick Karcher
Thanks! Your SQL Solutions worked beautifully.
Atømix