ansaurus

Question

Selecting only authors who have articles?

Answer 1

A:

You want an inner join

select
  *
from
  Authors
inner join
  Articles on
  Articles.ContributorID = Authors.ContributorID

This will return only authors who have a an entry on the Articles table, matched by ContributorID.

Noon Silk 2009-09-13 12:37:42

...and duplicates if multiple articles and all articles columns too

gbn 2009-09-13 12:44:12

What about the duplicate values when an author has more than one article?

Galilyou 2009-09-13 12:54:52

Yes but that could easily be fixed with a DISTINCT.

RichardOD 2009-09-13 12:58:42

@RichardOD: DISTINCT with ntext?

gbn 2009-09-13 13:04:11

@gbn- didn't see that bit!

RichardOD 2009-09-13 13:58:01

gdn: Fair enough on the multi rows. It will be a cold day in hell before I'm able to post a response to someones SQL questions without making a little mistake :P

Noon Silk 2009-09-13 22:46:52

Answer 2

+5 A:

An EXISTS allows for the potential duplicate entries when there are multiple articles per author:

Select * from Authors
Where EXISTS (SELECT *
    FROM Articles
    WHERE Articles.ContributorId = Authors.ContributorId)

Edit: To clarify, you can not DISTINCT on ntext columns. So, you can not have a JOIN solution, unless you use a derived table on articles in the JOIN and avoid using articles directly. Or you convert the ntext to nvarchar(max).

EXISTS or IN is your only option.

Edit 2:

...unless you really want to use a JOIN and you have SQL Server 2005 or higher, you can CAST and DISTINCT (aggregate) to avoid multiple rows in the output...

select DISTINCT
  Authors.ContributorID,
  Authors.AnotherColumn,
  CAST(Authors.biography AS nvarchar(max)) AS biography,
  Authors.YetAnotherColumn,
  ...
from
  Authors
inner join
  Articles on
  Articles.ContributorID = Authors.ContributorID

gbn 2009-09-13 12:42:21

The subselect should be written as `select 1 from articles ...`. You don't want to pull anything back from the backend, you just want to know if something is there.

dland 2009-09-13 13:15:22

@dland: This is myth. NULL, 1, or *: no difference to the plan.

gbn 2009-09-13 13:16:18

@gbn- +1, especially about the myth of SELECT 1.

RichardOD 2009-09-13 14:00:50

@gbn: Thanks for the extensive explanation. It seems like the only way to make the join is by casting .. For almost a million records in the table, what choice of those would you recommend?

Galilyou 2009-09-14 06:40:19

@7alwagy: I would use EXISTS to avoid the CAST/DISTINCT overhead, which is my first answer

gbn 2009-09-14 06:43:55

Apparently EXISTS would do best now, and so your answer gbn! Accepted. Thanks again :)

Galilyou 2009-09-14 06:56:04

Answer 3

A:

Select the distinct contributorIDs from the Articles table to get the individual authors who have written an article, and join the Authors table to that query - so something like

select distinct Articles.contributorID, Authors.*
from Articles
join Authors on Articles.contributerID = Authors.ContributerId

blowdart 2009-09-13 13:00:45

Will not work: cannot DISTINCT on ntext. And why 2 ContributorId columns in output?

gbn 2009-09-13 13:22:16

The two columns were just to make it clearer what it does. I missed the ntext, apologies - but that is a weird schema, normally I'd expect to see the author names in a field of their own, not in a text field.

blowdart 2009-09-13 13:35:52

@Blowdart: it's a "biography" column i think

gbn 2009-09-13 14:01:48

Ouch painful - shame really, authors is begging for a table of its own exactly for things like this

blowdart 2009-09-13 14:06:57

according to http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/c836eda3-f969-4ec2-a231-b2930e288ad5 you can cast ntext to nvarchar(MAX) and then distinct will work.

blowdart 2009-09-13 14:08:10

@blowdart: correct, but now it's becoming complicated. You suggest using a CAST followed by an aggregate (which is what DISTINCT is) to remove duplicate rows that should not be there in the first place. There are generated because JOIN is not the correct construct to test EXISTence of child rows.

gbn 2009-09-13 14:19:32

But the distinct is a lot less expensive than subqueries (generally)

blowdart 2009-09-13 14:31:00

@Blowdart: can you prove that, given an indexed join column?

gbn 2009-09-13 14:55:42

Not without his database (and you'll note I said generally). That's what profiler is there for.

blowdart 2009-09-13 15:45:10

@blowdart: http://explainextended.com/2009/06/16/in-vs-join-vs-exists/ from our very own Quassnoi http://stackoverflow.com/users/55159/quassnoi. However, he doesn't really look at the DISTINCT, especially not with the CAST of a LOB column too...

gbn 2009-09-13 16:17:37

@Blowdart: what's so wrong with biography column being ntext!? And yes the authors is a table of its own ( I don't get what you mean by "authors is begging for a table of its own exactly for things like this").

Galilyou 2009-09-14 06:37:32

Well if the biography is a paragraph then sure, that's fine, but personally I'd extract the author names into a table of their own exactly for situations like this, searching authors and drilling down.

blowdart 2009-09-14 08:07:39

ansaurus

tags:

views:

answers:

Selecting only authors who have articles?

related questions