tags:

views:

39

answers:

5

I've come across a strange query embedded in an application (yeah, great!):

SELECT PersonId
    , Salutation
    , Email
    , Postcode
FROM    vw_NewsletterSubscriptions
WHERE    PersonId IN (SELECT PersonId FROM vw_NewsletterSubscriptions) 
    AND NewsletterTypeID=1 
    AND UnSubscribeDate Is NULL
GROUP BY PersonId
    , Salutation
    , Email
    , Postcode

It's the SELECT in the WHERE clause that's got me. It seemed to be saying "select some data from this view where the data is in the view", which is a bit unnecessary. So I commented out that line in the WHERE clause:

SELECT PersonId
    , Salutation
    , Email
    , Postcode
FROM    vw_NewsletterSubscriptions
WHERE    NewsletterTypeID=1 
    AND UnSubscribeDate Is NULL
GROUP BY PersonId
    , Salutation
    , Email
    , Postcode

For completeness I ran both versions to check they are identical and they are not. The new version returns more rows - rows that were not in the first version. I've outer joined the two sets to see the difference but there's nothing obviously different about the extra rows.

I'm clearly missing something here. Can you explain what's going on, please?

+2  A: 

Are there any rows with NULL in PersonID - there would be filtered out by the IN clause in the first query but not by anything in the second, so would explain the second query returning more rows.

It might help if you could post some sample data (with names and other identifying data munged to protect the presumed innocent) - samples of rows that only appear in one and rows that appear in both.

(also, this question might be better suited to StackOverflow)

David Spillett
Hi David,There are no null PersonIDs in the view. There are 14 PersonIDs that occur twice in the view but there are about 600 extra rows when I take out the WHERE clause. I'd really rather not get into the data and query of the view itself as it's a bit of a monster. It's probably worth noting that the contents of the view are pretty stable. It's based on people who subscribe or unsubscribe to our newsletter which only happens evry now and then.
David Wimbush
A: 

If NULLs aren't the issue (as David Spillett suggested), then the best strategy is to start looking in detail at which rows are included in one and not the other, and see if you can figure out whats different about them.

If that doesn't help any, you'll have to start digging into the definition of the underlying view, and see if that gives you any clues. Views can contain complex joins, filters, UNIONS, triggers, and other crazy stuff that get in the way here.

If the view is simple enough, try to construct this query referencing the underlying tables instead... see if you get the same quirky behavior.

BradC
A: 

i'm assuming this is ms sql server (i see the sql-server tag), but if its not, please clarify. i tested this with 750 fake rows i inserted into a database. my data was all good (no nulls where there shouldn't be) and i was unable to replicate your issue. i have to assume it is a data problem of some sort. i don't see how its possible to fix the problem without seeing the data. one thing you might want to try is dumping the data from the view into a regular table (just the referenced fields). it would be interesting to see if the queries failed against a regular table. also, if you limit to just the 5 or 6 referenced fields something might jump out at you.

-don

Don Dickinson
A: 
PersonId IN (SELECT PersonId FROM vw_NewsletterSubscriptions)

implies

PersonId IS NOT NULL

You can add this condition to your query

AlexKuznetsov
A: 

Thanks for all the ideas, guys. I compared the results of the query with and without the seemingly unnecessary clause and I'm damned if I could see a pattern. The view was an overcomplicated mess so I refactored it and now I'm getting the same number of rows either way. Plus our number of newsletter subscribers just went up too!

David Wimbush