views:

691

answers:

2

I would like this SQL to be converted to LINQ. (it shouldl select rows from input which do not exist in table production based on 3 columns. If a column in both tables contains NULL, it should be considered as having the same value)

SELECT i.* FROM INPUT AS i
WHERE NOT EXISTS
(SELECT p.Agent FROM Production AS p
WHERE ISNULL(i.CustID,'') <> ISNULL(p.CustID,'')
AND ISNULL(i.CustName,'') <> ISNULL(p.CustName,'')
AND ISNULL(i.household,'') <> ISNULL(p.Household,''))
A: 

There is a product that can do this kind of thing for you. It is called Linqer. I have used it extensively and have yet to run across a query it cannot convert. It's pretty inexpensive and comes with a 30 day trial. I have no interest or relation to this product.

Randy

Randy Minder
Linq is not designed to modify data. It is designed to query data. So it doesn't surprise me that Linqer wouldn't translate an Update statement. However, you could have it generate the query that produces the result set to delete. Just think outside the box a little.
Randy Minder
+2  A: 

First of all - this is not a good SQL query. Every column is wrapped in a non-sargable function which means that the engine won't be able to take advantage of any indexes on any of those columns (assuming you have any).

Let's start by rewriting this as a semi-decent SQL query:

SELECT i.*
FROM Input i
LEFT JOIN Production p
    ON (p.CustID = i.CustID OR (p.CustID IS NULL AND i.CustID IS NULL))
    AND (p.CustName = i.CustName OR (p.CustName IS NULL AND i.CustName IS NULL))
    AND (p.Household = i.Household OR
        (p.Household IS NULL AND i.Household IS NULL))
WHERE p.CustID IS NULL

Now having said this, LEFT JOIN / IS NULL is not great for efficiency either, but we don't have much choice here because we're comparing on multiple columns. Based on your column names, I'm starting to wonder if the schema is properly normalized. A CustID should most likely be associated with one and only one CustName - the fact that you have to compare both of these seems a bit odd. And Household - I'm not sure what that is, but if it's a varchar(x)/nvarchar(x) column then I wonder if it might also have a 1:1 relationship with the customer.

If I'm speculating too much here then feel free to dismiss this paragraph; but just in case, I want to say that if this data isn't properly normalized, normalizing it would make it much easier and faster to query on:

SELECT *
FROM Input
WHERE CustID NOT IN (SELECT CustID FROM Production)

Anyway, going back to the first query, since that's what we have to work with for now. Unfortunately it's impossible to create a join on those specific conditions in Linq, so we need to rewrite the SQL query as something slightly worse (because we now have to read from Input twice):

SELECT *
FROM Input
WHERE <Primary Key> NOT IN
(
    SELECT i.<Primary Key>
    FROM Input i
    INNER JOIN Production p
    ON (p.CustID = i.CustID OR (p.CustID IS NULL AND i.CustID IS NULL))
    AND (p.CustName = i.CustName OR (p.CustName IS NULL AND i.CustName IS NULL))
    AND (p.Household = i.Household OR
        (p.Household IS NULL AND i.Household IS NULL))
)

Now we have something we can finally translate to Linq syntax. We still can't do the join explicitly, which would be best, but we go old-school, start from the cartesian join and toss the join conditions into the WHERE segment, and the server will still be able to sort it out:

var excluded =
    from i in input
    from p in production
    where
        ((p.CustID == i.CustID) || ((p.CustID == null) && (i.CustID == null))) &&
        ((p.CustName == i.CustName) || 
            ((p.CustName == null) && (i.CustName == null))) &&
        ((p.Household == i.Household) ||
            ((p.Household == null) && (i.Household == null)));
    select i.PrimaryKey;

var results =
    from i in input
    where !excluded.Contains(i.PrimaryKey)
    select i;

I'm assuming here that you have some sort of primary key on the table. If you don't, you've got other problems, but you can get around this particular problem using EXCEPT:

var excluded =
    from i in input
    from p in production
    where
        ((p.CustID == i.CustID) || ((p.CustID == null) && (i.CustID == null))) &&
        ((p.CustName == i.CustName) || 
            ((p.CustName == null) && (i.CustName == null))) &&
        ((p.Household == i.Household) ||
            ((p.Household == null) && (i.Household == null)));
    select i;

var results = input.Except(excluded);
Aaronaught
A SQL join doesn't work here because it will return each row from the second table where the 3 columns match. I just need one row. The column names are for illustration only. There are no primary keys and the ID column are not related. ID's are unique to each table so it can't be used. Your LINQ gave me some pointers. Thanks.
Tony_Henrich
@Tony_Henrich: If there are multiple matching rows in the right-hand table, then you are right, the first query will return duplicate rows. However, the subsequent queries I wrote will not have that problem, and the Linq queries will do what you want (including the last one if there's no PK... but I would really recommend against not having a PK, this example is a good illustration of why they are important).
Aaronaught
To make long story short, the tables are actually Excel spreadsheets. I have copied the data to SQL Server to verify the operations. Primary keys won't help because there are no candidates for it. The unique ID column in each table is unique to that table. The values do not match between the tables.
Tony_Henrich
Yuck. Trying to use Excel as a database is asking for trouble. I'd just throw in a couple of `VLOOKUP` functions and be done with it.
Aaronaught
I am not using Excel as a database engine. The data is in Excel spreadsheets. The data is imported into generic lists where LINQ is used to query and update the data and then written back to Excel. The process avoids using a a database engine. I transfer the data to SQL Server because I understand SQL better than LINQ, specially when it comes to complicated joins.
Tony_Henrich
@Tony: I understand what you're trying to do and hopefully the last couple of queries in my answer will do what you want (they should). Long-term, I think you'd be better off not using Excel at all and having this data entered into a database to start with; of course, if it's a hard requirement for this data to be received in Excel then that is what you do. :-) Good luck!
Aaronaught