views:

169

answers:

2

Using SQL Server. Ok, I'm making an in webpage cache system. I wanted to make a simple page rank system along with output. The problem is, I want to display the recordset with the highest relevance score per unique domain. One domain may have multiple records but with different titles, descriptions, etc. he probem is,instead of getting 1 recordset containing a unique domain, it groups all the recordsets of that unique domain and outputs them all. I just want the recordset with the highest relevance score per unique domain per group before it outputs the next (and different domain with the highest relevance for that group)

SELECT title, html, sum(relevance) FROM
  (
  SELECT title, html, 10 AS relevance FROM page WHERE title like ‘%about%’ UNION
  SELECT title, html, 7 AS relevance FROM page WHERE html like ‘%about%’ UNION
  SELECT title, html, 5 AS relevance FROM page WHERE keywords like ‘%about%’ UNION
  SELECT title, html, 2 AS relevance FROM page WHERE description like ‘%about%’
  ) results
GROUP BY title, html
ORDER BY relevance desc;

I'm getting:

domain1 title html
domain1 title html
domain1 title html
domain2 title html
domain2 title html
domain2 title html

What I want is

domain1 title html
domain2 title html
domain3 title html
domain4 title html
domain5 title html

Thanks in advance

+1  A: 

I'm not sure why your code even works, since I think you should have

ORDER BY Sum(relevance) DESC

instead of

ORDER BY relevance DESC

Maybe that's the problem?

Beyond that, what about this. It is ugly, but it will work. It would be better if SQL Server understood how to refer to aliases later in the query. But alas.

SELECT title, html,
Case When title LIKE '%about%' Then 10 Else 0 End +
Case When html LIKE '%about%' Then 7 Else 0 End +
Case When keywords LIKE '%about%' Then 5 Else 0 End +
Case When description LIKE '%about%' Then 2 Else 0 End AS relevance
FROM page
WHERE Case When title LIKE '%about%' Then 10 Else 0 End +
Case When html LIKE '%about%' Then 7 Else 0 End +
Case When keywords LIKE '%about%' Then 5 Else 0 End +
Case When description LIKE '%about%' Then 2 Else 0 End > 0
ORDER BY Case When title LIKE '%about%' Then 10 Else 0 End +
Case When html LIKE '%about%' Then 7 Else 0 End +
Case When keywords LIKE '%about%' Then 5 Else 0 End +
Case When description LIKE '%about%' Then 2 Else 0 End DESC;

Or maybe just a slight rearrangement:

SELECT title, html, relevance
FROM (SELECT title, html,
Case When title LIKE '%about%' Then 10 Else 0 End +
Case When html LIKE '%about%' Then 7 Else 0 End +
Case When keywords LIKE '%about%' Then 5 Else 0 End +
Case When description LIKE '%about%' Then 2 Else 0 End AS relevance
FROM page)
WHERE relevance > 0
ORDER BY relevance DESC;
Erick Robertson
No matter which way i do this I keep getting invalid column 'relevance'
@kinjite: It's because `relevance` is a column alias, referenced in the WHERE clause. SQL Server supports referencing column aliases in the GROUP BY, but not the WHERE
OMG Ponies
@OMG Ponies: or the ORDER BY, if I recall correctly. I edited the answer to fix this. It's very ugly, but it will work. I've used this hack before.
Erick Robertson
A: 

ORDER BY relevance is causing your query to behave as though relevance (non-aggregated) is in the SELECT clause. Erick is right -- ORDER BY sum(relevance) should fix your mistake.

dave