views:

85

answers:

3

I've got two SQL Server tables authors, and articles where authors primary key (AuthorID) is a foreign key in the articles table to represent a simple one-to-many relationship between authors and articles table. Now here's the problem, I need to issue a full text search on the authors table based on the first name, last name, and biography columns. The full text search is working awesome and ranking and all. Now I need to add one more criteria to my search, I need all the non-articles contributors to be ignored from the search. To achieve that I chose to create a view with all the contributors that have articles and search against this view. So I created the view this way:

    Create View vw_Contributors_With_Articles
AS 
Select * from Authors
Where Authors.ContributorID 
IN ( Select Distinct (Articles.ContributorId) From Articles)

It's working but I really don't like the subquery thing. The join gets me all the redundant authorIDs, tried distinct but didn't work with the biography column as it's type is ntext. Group by wouldn't do it for me because I need all the columns not any aggregate of them.

What do you think guys? How can I improve this?

A: 

You want an inner join

select
  *
from
  Authors
inner join
  Articles on
  Articles.ContributorID = Authors.ContributorID

This will return only authors who have a an entry on the Articles table, matched by ContributorID.

Noon Silk
...and duplicates if multiple articles and all articles columns too
gbn
What about the duplicate values when an author has more than one article?
Galilyou
Yes but that could easily be fixed with a DISTINCT.
RichardOD
@RichardOD: DISTINCT with ntext?
gbn
@gbn- didn't see that bit!
RichardOD
gdn: Fair enough on the multi rows. It will be a cold day in hell before I'm able to post a response to someones SQL questions without making a little mistake :P
Noon Silk
+5  A: 

An EXISTS allows for the potential duplicate entries when there are multiple articles per author:

Select * from Authors
Where EXISTS (SELECT *
    FROM Articles
    WHERE Articles.ContributorId = Authors.ContributorId)

Edit: To clarify, you can not DISTINCT on ntext columns. So, you can not have a JOIN solution, unless you use a derived table on articles in the JOIN and avoid using articles directly. Or you convert the ntext to nvarchar(max).

EXISTS or IN is your only option.

Edit 2:

...unless you really want to use a JOIN and you have SQL Server 2005 or higher, you can CAST and DISTINCT (aggregate) to avoid multiple rows in the output...

select DISTINCT
  Authors.ContributorID,
  Authors.AnotherColumn,
  CAST(Authors.biography AS nvarchar(max)) AS biography,
  Authors.YetAnotherColumn,
  ...
from
  Authors
inner join
  Articles on
  Articles.ContributorID = Authors.ContributorID
gbn
The subselect should be written as `select 1 from articles ...`. You don't want to pull anything back from the backend, you just want to know if something is there.
dland
@dland: This is myth. NULL, 1, or *: no difference to the plan.
gbn
@gbn- +1, especially about the myth of SELECT 1.
RichardOD
@gbn: Thanks for the extensive explanation. It seems like the only way to make the join is by casting .. For almost a million records in the table, what choice of those would you recommend?
Galilyou
@7alwagy: I would use EXISTS to avoid the CAST/DISTINCT overhead, which is my first answer
gbn
Apparently EXISTS would do best now, and so your answer gbn! Accepted. Thanks again :)
Galilyou
A: 

Select the distinct contributorIDs from the Articles table to get the individual authors who have written an article, and join the Authors table to that query - so something like

select distinct Articles.contributorID, Authors.*
from Articles
join Authors on Articles.contributerID = Authors.ContributerId
blowdart
Will not work: cannot DISTINCT on ntext. And why 2 ContributorId columns in output?
gbn
The two columns were just to make it clearer what it does. I missed the ntext, apologies - but that is a weird schema, normally I'd expect to see the author names in a field of their own, not in a text field.
blowdart
@Blowdart: it's a "biography" column i think
gbn
Ouch painful - shame really, authors is begging for a table of its own exactly for things like this
blowdart
according to http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/c836eda3-f969-4ec2-a231-b2930e288ad5 you can cast ntext to nvarchar(MAX) and then distinct will work.
blowdart
@blowdart: correct, but now it's becoming complicated. You suggest using a CAST followed by an aggregate (which is what DISTINCT is) to remove duplicate rows that should not be there in the first place. There are generated because JOIN is not the correct construct to test EXISTence of child rows.
gbn
But the distinct is a lot less expensive than subqueries (generally)
blowdart
@Blowdart: can you prove that, given an indexed join column?
gbn
Not without his database (and you'll note I said generally). That's what profiler is there for.
blowdart
@blowdart: http://explainextended.com/2009/06/16/in-vs-join-vs-exists/ from our very own Quassnoi http://stackoverflow.com/users/55159/quassnoi. However, he doesn't really look at the DISTINCT, especially not with the CAST of a LOB column too...
gbn
@Blowdart: what's so wrong with biography column being ntext!? And yes the authors is a table of its own ( I don't get what you mean by "authors is begging for a table of its own exactly for things like this").
Galilyou
Well if the biography is a paragraph then sure, that's fine, but personally I'd extract the author names into a table of their own exactly for situations like this, searching authors and drilling down.
blowdart