views:

318

answers:

6

As the title says, I'm using SQL Server 2008. Apologies if this question is very basic. I've only been using SQL for a few days. Right now I have the following query:

SELECT TOP 10 p.id, pl.nm, pl.val, pl.txt_val

from dm.labs pl
join mas_data.patients p    
  on pl.id = p.id
  where pl.nm like '%LDL%'
  and val is not null

What I want to do is use select top n together with distinct values in the id column. Searching through some forums says to use

SELECT DISTINCT TOP 10 ...

but when I replace the first line with

SELECT DISTINCT TOP 10 p.id, pl.nm, pl.val, pl.txt_val

I get the same results as without the word distinct. What should I be doing to only get to filter out duplicate id entries?

Thank you.

A: 
select top 10 * from
(
    select distinct p.id, ....
)

will work.

cdonner
he wants distinct values for ID's as far as i can see, so this won't work
Paul Creasey
you are correct.
cdonner
A: 

DISTINCT removes rows if all selected values are equal. Apparently, you have entries with the same p.id but with different pl.nm (or pl.val or pl.txt_val). The answer to your question depends on which one of these values you want to show in the one row with your p.id (the first? the smallest? any?).

Heinzi
A: 

Few ideas:

  1. You have quite a few fields in your select statement. Any value being different from another will make that row distinct.
  2. TOP clauses are usually paired with WHERE clauses. Otherwise TOP doesn't mean much. Top of what? The way you specify "top of what" is to sort by using WHERE
  3. It's entirely possible to get the same results even though you use TOP and DISTINCT and WHERE. Check to make sure that the data you're querying is indeed capable of being filtered and ordered in the manner you expect.

Try something like this:

SELECT DISTINCT TOP 10 p.id, pl.nm -- , pl.val, pl.txt_val
FROM dm.labs pl
JOIN mas_data.patients p    
on pl.id = p.id
where pl.nm like '%LDL%'
and val is not null
ORDER BY pl.nm

Note that i commented out some of the SELECT to limit your result set and DISTINCT logic.

Paul Sasik
A: 

I think the problem is that you want one result for each p.id?

But you are getting "duplicate" results for some p.id's, is that right?

The DISTINCT keyword applies to the entire result set, so applies to pl.nm, pl.val, pl.txt_val, not just p.id.

You need something like

SELECT TOP 10 p.id, max( p1.nm ), max (p1.val), ...
FROM ...
GROUP BY p.id

Won't need the distinct keyword then.

MikeW
-1 Using MAX in this way will make the result set arbitrary.
Paul Sasik
yes, but if you want one result per p.id, you'll have to do something arbitrary or leave off those columns entirely
MikeW
Not necessarily, if you want the row that is associated with a specific row for each p.id, for example the one with the max(nm) or max(), you want the whole row, not random max values from potentially different rows. The solution Paul Creasey gave has the potential to do this simply by changing the order by inside the over() clause. Not sure what coward down-voted his answer without stating his/her reason(s).
Aaron Bertrand
I can't edit my comment... obviously I means max(nm) or max(val)...
Aaron Bertrand
Agreed, that's probably more sensible.
MikeW
A: 

The easy option is to use group by and select min/max for all other fields

SELECT TOP 10 
    p.id, 
    max(pl.nm),
    max(pl.val),
    max(pl.txt_val)
from 
    dm.labs pl
join 
    mas_data.patients p    
on 
    pl.id = p.id
  where 
    pl.nm like '%LDL%'
and 
    val is not null
group by 
    p.id

This can get quite tedious for wide table so the other option is to use rank over and partiion

SELECT TOP 10 
    p.id, 
     pl.nm, 
     pl.val, 
   pl.txt_val, 
    rank() over(partition by p.id order by p.id) as Rank
from 
    dm.labs pl
join 
    mas_data.patients p    
on 
    pl.id = p.id
  where 
    pl.nm like '%LDL%'
and 
    val is not null
and
    Rank = 1
Paul Creasey
A: 

You could use a Common Table Expression to get the top 10 distinct ID's and then join those to the rest of your data:

;WITH TopTenIDs AS
( 
   SELECT DISTINCT TOP 10 id
   FROM dm.labs 
   ORDER BY ......
)
SELECT 
    tti.id, pl.nm, pl.val, pl.txt_val
FROM
    TopTenIDs tti
INNER JOIN
    dm.labs pl ON pl.id = tti.id
INNER JOIN 
    mas_data.patients p ON pl.id = p.id
WHERE
    pl.nm like '%LDL%'
    AND val IS NOT NULL

That should work. Mind you: if you have a "TOP x" clause, you typically also need an ORDER BY clause - if you want the TOP 10, you need to tell the system in what order that "TOP" is.

PS: why do you even join the "patients" table, if you never select any fields from it??

marc_s