views:

47

answers:

2

Hello,

I am new to ADO.net and have this problem:

Let's assume I have these two tables in a SQL Server 2005 database, with these columns:

[Orders]

  • OrderID
  • OrderDate
  • ShopID
  • TotalAmount
  • TotalTaxAmount
  • etc...

[OrdersDetails]

  • OrderID
  • ShopID
  • ItemID
  • Quantity
  • Amount
  • TaxAmount
  • etc

I have started a WinForms application to get myself started. In this form, the user can select a list of Shops and select a date range to see all orders from this shop.

I have added a data source from Visual Studio, select both Orders and OrdersDetails table and dragged and dropped the Orders and related OrdersDetails tables into the form as DataGridViews.

When I select a row from the Orders DataGridView, I indeed see the corresponding Orders Details in the second DataGridView as I wanted. I had relationships inside this database and ADO.net caught them up and reflected them in the dataset.

I have then added a method to my typed dataset to get data by the OrderDate, and ShopID column. As the OrdersDetails table does not have an OrderDate column, I could only filter it by ShopID.

The issue is that it is time consuming to get the records from the OrdersDetails as it will retrieve more rows than needed into the DataTable for the OrdersDetails. The problem is that I can only filter the rows from the OrderDetails table by ShopID, which returns way too many records from the database.

Obviously, ADO.net is able to filter them appropriately on the client-side by using the OrderID relationship but it would make much more sense to retrieve only the rows from the OrdersDetails that I actually need.

I have modified my queries getting the data from the second table to add the OrderDate using a join, so I can filter by date when I retrieve the data from the database... however, it causes problems when I try to update my changes due to this foreign column...

I believe there must be an easy way around this, isn't there?

Thanks a lot in advance.

+2  A: 

You want to do something like this

SELECT *
FROM OrderDetails
WHERE
    ShopID IN ( @listOfShopIds )
    AND
    OrderID IN (
        SELECT OrderID
        FROM Orders
        WHERE
            OrderDate BETWEEN @dateFrom AND @dateTo
    )
David Hedlund
`WHERE OrderID IN ( SELECT ... )` can easily be converted to a join, which would be much more efficient.
Winston Smith
@Winston Smith: it could easily be written as an inner join, yes. in fact, that is how the sql server will merge the results of a select like this one. it will use the same statistics, come up with the same execution plan, and perform exactly as well (since it's doing the exact same thing) in the two solutions.
David Hedlund
Thanks a lot.Actually, I did a join before as I stated in the question, but my problem was that I added this column to the returned table which caused the problems with the updates. Your answer made me realize my mistake. Thanks.
Kharlos Dominguez
A: 

@David it's probably better to write code which expresses your intent and reflects the more performant algorithm, rather than relying on implementation details of the engine to perform the optimization.

Winston Smith
@Winston Smith: uh, didn't notice this one until now. however you write your sql query, you're relying on the engine to execute it in a performant manner, utilizing indexes and statistics at hand. whether you phrase it as a join or a subquery, you're dealing with two separate recordsets that will be magically merged by the engine based on a given criteria. the means by which the two sets are merged in these cases is called *nested loops*. nested loops is what is going on both in the join and the subquery. its cost will in most scenarios be negligible in comparison with filtering out the ...
David Hedlund
two subsets that are subject to the merge in the first place, but that is peripheral to the point - the key point being that whatever its cost is, it is the *same* in both ways of writing the code, because the two ways *are* the same (in this context). so there is no one way that is more performant, and there is no one of the two ways of writing the query that is closer than the other to saying "Hey SQL server, do an index seek on this table and another one on this table, using index such and such, and then merge the two sets using nested loops into a single result". we simply *have* to ...
David Hedlund
rely on the engine to figure that out. at the end it's all a question of preference. and i honestly do think that "where order id is the id of one of the orders within this span" expresses the intended result *at least* as well as any join would.
David Hedlund