views:

874

answers:

4

I'm trying to determine how to count the matching rows on a table using the EntityFramework.

The problem is that each row might have many megabytes of data (in a Binary field). Of course the SQL would be something like this:

SELECT COUNT(*) FROM [MyTable] WHERE [fkID] = '1';

I could load all of the rows and then find the Count with:

var owner = context.MyContainer.Where(t => t.ID == '1');
owner.MyTable.Load();
var count = owner.MyTable.Count();

But that is grossly inefficient. Is there a simpler way?


EDIT: Thanks, all. I've moved the DB from a private attached so I can run profiling; this helps but causes confusions I didn't expect.

And my real data is a bit deeper, I'll use Trucks carrying Pallets of Cases of Items -- and I don't want the Truck to leave unless there is at least one Item in it.

My attempts are shown below. The part I don't get is that CASE_2 never access the DB server (MSSQL).

var truck = context.Truck.FirstOrDefault(t => (t.ID == truckID));
if (truck == null)
    return "Invalid Truck ID: " + truckID;
var dlist = from t in ve.Truck
    where t.ID == truckID
    select t.Driver;
if (dlist.Count() == 0)
    return "No Driver for this Truck";

var plist = from t in ve.Truck where t.ID == truckID
    from r in t.Pallet select r;
if (plist.Count() == 0)
    return "No Pallets are in this Truck";
#if CASE_1
/// This works fine (using 'plist'):
var list1 = from r in plist
    from c in r.Case
    from i in c.Item
    select i;
if (list1.Count() == 0)
    return "No Items are in the Truck";
#endif

#if CASE_2
/// This never executes any SQL on the server.
var list2 = from r in truck.Pallet
        from c in r.Case
        from i in c.Item
        select i;
bool ok = (list.Count() > 0);
if (!ok)
    return "No Items are in the Truck";
#endif

#if CASE_3
/// Forced loading also works, as stated in the OP...
bool ok = false;
foreach (var pallet in truck.Pallet) {
    pallet.Case.Load();
    foreach (var kase in pallet.Case) {
        kase.Item.Load();
        var item = kase.Item.FirstOrDefault();
        if (item != null) {
            ok = true;
            break;
        }
    }
    if (ok) break;
}
if (!ok)
    return "No Items are in the Truck";
#endif

And the SQL resulting from CASE_1 is piped through sp_executesql, but:

SELECT [Project1].[C1] AS [C1]
FROM   ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN  (SELECT 
    [GroupBy1].[A1] AS [C1]
    FROM ( SELECT 
        COUNT(cast(1 as bit)) AS [A1]
        FROM   [dbo].[PalletTruckMap] AS [Extent1]
        INNER JOIN [dbo].[PalletCaseMap] AS [Extent2] ON [Extent1].[PalletID] = [Extent2].[PalletID]
        INNER JOIN [dbo].[Item] AS [Extent3] ON [Extent2].[CaseID] = [Extent3].[CaseID]
        WHERE [Extent1].[TruckID] = '....'
    )  AS [GroupBy1] ) AS [Project1] ON 1 = 1

[I don't really have Trucks, Drivers, Pallets, Cases or Items; as you can see from the SQL the Truck-Pallet and Pallet-Case relationships are many-to-many -- although I don't think that matters. My real objects are intangibles and harder to describe, so I changed the names.]

+2  A: 

I think you want something like

var count = context.MyTable.Count(t => t.MyContainer.ID == '1');

(edited to reflect comments)

Iceman
I believe he needs the count of MyTable and not of MyContainer...
J.13.L
No, he needs the count of the entities in MyTable referenced by the one entity with ID = 1 in MyContainer
Craig Stuntz
Incidentally, if t.ID is a PK, then count in the code above will *always* be 1. :)
Craig Stuntz
@Craig, you're right, I should have used t.ForeignTable.ID. Updated.
Iceman
+1  A: 

I think this should work...

var query = from m in context.MyTable
            where m.MyContainerId == '1' // or what ever the foreign key name is...
            select m;

var count = query.Count();
J.13.L
This is the direction I went at first, too, but it's my understanding that unless you've manually added it, m will have a MyContainer property but no MyContainerId. Hence, what you want to examine is m.MyContainer.ID.
Iceman
If MyContainer is the parent and MyTable are the children in the relationship then you had to establish that relationship with some foreign key, I am not sure how else you would know which MyTable entities where a associated with a MyContainer entity... But maybe I made an assumption about the structure...
J.13.L
+3  A: 
var count = (from o in context.MyContainer
             where o.ID == '1'
             from t in o.MyTable
             select t).Count();
Craig Stuntz
Nitpick: missing () on Count();
Iceman
Thanks; fixed it.
Craig Stuntz
A: 

Well, even the SELECT COUNT(*) FROM Table will be fairly inefficient, especially on large tables, since SQL Server really can't do anything but do a full table scan (clustered index scan).

Sometimes, it's good enough to know an approximate number of rows from the database, and in such a case, a statement like this might suffice:

SELECT 
    SUM(used_page_count) * 8 AS SizeKB,
    SUM(row_count) AS [RowCount], 
    OBJECT_NAME(OBJECT_ID) AS TableName
FROM 
    sys.dm_db_partition_stats
WHERE 
    OBJECT_ID = OBJECT_ID('YourTableNameHere')
    AND (index_id = 0 OR index_id = 1)
GROUP BY 
    OBJECT_ID

This will inspect the dynamic management view and extract the number of rows and the table size from it, given a specific table. It does so by summing up the entries for the heap (index_id = 0) or the clustered index (index_id = 1).

It's quick, it's easy to use, but it's not guaranteed to be 100% accurate or up to date. But in many cases, this is "good enough" (and put much less burden on the server).

Maybe that would work for you, too? Of course, to use it in EF, you'd have to wrap this up in a stored proc or use a straight "Execute SQL query" call.

Marc

marc_s
It won't be a full table scan due to the FK reference in the WHERE. Only details of the master will be scanned. The performance issue he was having was from loading blob data, not record count. Presuming there aren't typically tens of thouusands + of detail records per master record, I wouldn't "optimize" something which isn't actually slow.
Craig Stuntz
OK, yes, in that case, you'll only select a subset - that should be fine. As for the blob data - I was under the impression you could set a "deferred loading" on any column in any of your EF tables to avoid loading it, so that might help.
marc_s
Is there a way to use this SQL with the EntityFramework? Anyway, in this case I only needed to know there were matching rows, but I intentionally asked the question more generally.
NVRAM