views:

70

answers:

3

Our calendar application represents an appointment domain as:

Appointment

  • ID (PK)
  • StartDateTime
  • EndDateTime
  • ...

AppointmentRole

  • AppointmentID (FK)
  • PersonOrGroupID (FK) /* joins to a person/group, outside the scope of this question */
  • Role
  • ...

Appointment has a 1 to many relationship with AppointmentRoles. Each AppointmentRole represents a person or group in a particular role (e.g., drop-off, pick-up, attend, ...).

The relationship serves two purposes:

  1. it defines an access control list - an authenticated principal can only view an appointment if their access control list matches an associated person or group
  2. it documents who is attending the appointment and in what roles.

There is also a third table to keep track of notes/comments associated with the appointment. It's on the many side of a 1 to many relationship with Appointment:

AppointmentNote

  • AppointmentID (FK)
  • ...

To display a calendar of appointments, we currently use something like...

List<IAppointment> GetAppointments(IAccess acl, DateTime start, DateTime end, ...
{
  // Retrieve distinct appointments that are visible to the acl

  var visible = (from appt in dc.Appointments
                 where !(appt.StartDateTime >= end || appt.EndDateTime <= start)
                 join role in
                   (from r in dc.Roles
                    where acl.ToIds().Contains(r.PersonOrGroupID)
                    select new { r.AppointmentID })
                 on appt.ID equals role.AppointmentID
                 select new
                 {
                   ...
                 }).Distinct();

  ...

The visible Linq expression selects the distinct appointments that can be seen by the given access control list.

Below, we take visible and join/into roles and notes to pick up all the people and groups involved with an appointment and the appointment notes.

  ...

  // Join/into to get all appointment roles and notes

  var q = from appt in visible
          orderby appt.StartDateTime, ...
          join r in dc.Roles
          on appt.ID equals r.AppointmentID
          into roles
          join note in dc.AppointmentNotes
          on appt.ID equals note.AppointmentID
          into notes
          select new { Appointment = appt, Roles = roles, Notes = notes };

Finally, we enumerate the query, hoping that Linq-To-Sql will generate one fabulously optimized query (no such luck as discussed later)...

  // Marshal the anonymous type into an IAppointment
  // IAppointment has a Roles and Notes collection

  var result = new List<IAppointment>();
  foreach (var record in q)
  {
    IAppointment a = new Appointment();
    a.StartDateTime = record.StartDateTime;
    ...
    a.Roles = Marshal(record.Roles);
    a.Notes = Marshal(record.Notes);

    result.Add(a);
  }

The query produced by Linq-to-Sql is very chatty. It generates a single query to determine the visible appointments. But then it generates three queries on each iteration: one to pick up the appointment fields, a second to pick up the roles and a third to pick up the notes. The where clause is always the visible appointment id.

So, we're refactoring GetAppointments and thought we could benefit from the SO community's expertise.

We expect to move everything into a T-SQL stored proc so we have more control. Can you share your thoughts on how you would tackle this problem? Changes to the data model, T-SQL and Linq-to-SQL modifications are all fair game. We would also like advice on indexes. We're using MS-SqlServer 2008 and .NET 4.0.

+2  A: 

If I understand correctly, and Appointment has a collection of Roles and a collection of Notes. If this is the case (and you modeled this out correctly in the designer) you have these Roles and Notes properties in the Appointment class. When you change the projection (the select) of your q query select the Appointment itself, you can help LINQ to SQL to fetch the below collections for you. In that case, you should write your query as follows:

var q =
    from appt in visible
    ...
    select appt;

After this, you can use the LoadOptions property of the DataContext to prefetch a sub collection for you as follows:

using (var db = new AppointmentContext())
{
    db.LoadOptions.LoadWith<Appointment>(a => a.Roles);

    // Do the rest here
}

One problem however here is that I think that the LoadWith is limited to load a single sub collection, not two.

You can solve this by writing it out in two queries. The first query were you fetch the appointments and use LoadWith to fetch all Roles. Then use a second query (in a new DataContext) and use LoadWith to fetch all Notes).

Good luck.

Steven
Thanks. I'll play around with LoadOptions and see what linq generates.
Rob
+3  A: 

I would say the root of all evils starts here:

where acl.ToIds().Contains(r.PersonOrGroupID) 

The acl.ToIds().Contains(...) is an expression that cannot be resolved on the server side, so the visible query has to be resolved (very inneficiently) on the client side, and worse, the result has to be retaine don the client and then, as it is iterated, thre distinct queries have to be sent to the server for each visible appointment (appointment fields, roles and notes). If I'd have things my way, I would create a stored procedure that accepts the ACL list as a Table Valued Parameter and does all the joining/filtering on the server side.

I woudl start with this schema:

create table Appointments (
    AppointmentID int not null identity(1,1),
    Start DateTime not null,
    [End] DateTime not null,
    Location varchar(100),
    constraint PKAppointments
        primary key nonclustered (AppointmentID));

create table AppointmentRoles (
    AppointmentID int not null,
    PersonOrGroupID int not null,
    Role int not null,
    constraint PKAppointmentRoles
        primary key (PersonOrGroupID, AppointmentID), 
    constraint FKAppointmentRolesAppointmentID
        foreign key (AppointmentID)
        references Appointments(AppointmentID));

create table AppointmentNotes (
    AppointmentID int not null,
    NoteId int not null,
    Note varchar(max),

    constraint PKAppointmentNotes
        primary key (AppointmentID, NoteId),
    constraint FKAppointmentNotesAppointmentID
        foreign key (AppointmentID)
        references Appointments(AppointmentID));
go

create clustered index cdxAppointmentStart on Appointments (Start, [End]);
go

And retrieve the appointments for a arbitrary ACL like this:

create type AccessControlList as table 
    (PersonOrGroupID int not null);
go

create procedure usp_getAppointmentsForACL
 @acl AccessControlList readonly,
 @start datetime,
 @end datetime
as
begin
    set nocount on;
    select a.AppointmentID
        , a.Location
        , r.Role
        , n.NoteID
        , n.Note
    from @acl l 
    join AppointmentRoles r on l.PersonOrGroupID = r.PersonOrGroupID
    join Appointments a on r.AppointmentID = a.AppointmentID
    join AppointmentNotes n on n.AppointmentID = a.AppointMentID
    where a.Start >= @start
    and a.[End] <= @end;    
end
go

Lets try this out on 1M appointments. First, populate the tables (takes about 4-5 mins):

set nocount on;
declare @i int = 0;
begin transaction;
while @i < 1000000
begin
    declare @start datetime, @end datetime;
    set @start = dateadd(hour, rand()*10000-5000, getdate());
    set @end = dateadd(hour, rand()*100, @start)
    insert into Appointments (Start, [End], Location)
    values (@start, @end, replicate('X', rand()*100));

    declare @appointmentID int = scope_identity();
    declare @atendees int = rand() * 10.00 + 1.00;
    while @atendees > 0
    begin
        insert into AppointmentRoles (AppointmentID, PersonOrGroupID, Role)
        values (@appointmentID, @atendees*100 + rand()*100, rand()*10);
        set @atendees -= 1;
    end

    declare @notes int = rand()*3.00;
    while @notes > 0
    begin
        insert into AppointmentNotes (AppointmentID, NoteID, Note)
        values (@appointmentID, @notes, replicate ('Y', rand()*1000));
        set @notes -= 1;
    end

    set @i += 1;
    if @i % 10000 = 0
    begin
        commit;
        raiserror (N'Added %i appointments...', 0, 1, @i);
        begin transaction;
    end
end
commit;
go

So lets see today appointments for a few persons:

set statistics time on;
set statistics io on;

declare @acl AccessControlList;
insert into @acl (PersonOrGroupID) values (102),(111),(131);
exec usp_getAppointmentsForACL @acl, '20100730', '20100731';

Table 'AppointmentNotes'. Scan count 8, logical reads 39, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Appointments'. Scan count 1, logical reads 9829, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'AppointmentRoles'. Scan count 3, logical reads 96, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#25869641'. Scan count 1, logical reads 1, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 63 ms,  elapsed time = 1294 ms.

 SQL Server Execution Times:
   CPU time = 63 ms,  elapsed time = 1294 ms.

1.2 seconds (ona a cold cache, it goes to 224 ms on a warm cache). Hmm, that's not very good. The problem is the 9829 pages hit in Appointment table. To improve this, we would like to have both filtering criteria (acl and date) at once. Perhaps an indexed view?

create view vwAppointmentAndRoles 
with schemabinding
as
select r.PersonOrGroupID, a.AppointmentID, a.Start, a.[End]
from dbo.AppointmentRoles r
join dbo.Appointments a on r.AppointmentID = a.AppointmentID;
go

create unique clustered index cdxVwAppointmentAndRoles on vwAppointmentAndRoles (PersonOrGroupID, Start, [End]);
go

alter procedure usp_getAppointmentsForACL
 @acl AccessControlList readonly,
 @start datetime,
 @end datetime
as
begin
    set nocount on;
    select ar.AppointmentID
        , a.Location
        , r.Role
        , n.NoteID
        , n.Note
    from @acl l 
    join vwAppointmentAndRoles ar with (noexpand) on l.PersonOrGroupID = ar.PersonOrGroupID
    join AppointmentNotes n on n.AppointmentID = ar.AppointMentID
    join Appointments a on ar.AppointmentID = a.AppointmentID
    join AppointmentRoles r 
        on ar.AppointmentID = r.AppointmentID
        and ar.PersonOrGroupID = r.PersonOrGroupID
    where ar.Start >= @start
     and ar.Start <= @end
    and ar.[End] <= @end;   
end
go

We can also change the clustered index on the Appointments to a probably more usefull AppointmentID:

drop index cdxAppointmentStart on Appointments;
create clustered index cdxAppointmentAppointmentID on Appointments (AppointmentID);
go

This returns the appointments in the same @acl list for same date range in 77ms (on a warm cache).

Now, of course, the actual schema you should use depends on a lot more factors not taken into consideration. But I hope this has given you some idea about proper action to take now to get decent performance. Adding the table valued parameter to the client execution context and passing it to the procedure, as well as LINQ integration, is left as an exercise to the reader.

Remus Rusanu
Wow. Great answer. Lots to play with. That's why I posted on SO. Can you point me at good books and/or sites to read up on techniques like this (table valued parameters, indexed views, profiling, etc.). This query is the most important part of the app and we'll be tuning it for a long time.
Rob
The list of books Mitch listed at http://stackoverflow.com/questions/3380947/sql-server-book-recommendation/3381007#3381007 is a pretty good start.
Remus Rusanu
@Remus: fyi, LINQ does translate the acl.ToIDs().Contains into a SQL IN statement. I still think the table valued function speeds things up, but the first query is done server side.
Rob
@Remus: also, one thing that's subtle from the original question. I have to join with roles once just to get the visible appointments, then again on resulting appointments to get all the roles for presentation purposes. I'm going to experiment with a stored proc that selects to a temporary table and then joins that table to roles and notes.
Rob
shouldn't there be some left joins? Appointments which have no notes should still be loaded.
David B
@david b: you're right. in the original, linq uses a left join to implement join ... into ...
Rob
I forgot about the `Contains()` being translated into the `IN()` operator. If your ACLs are short then there's no point for a TVP. You can reformulate the query in the SP as one single LINQ query (including an extra join with AppointmentRoles to fetch all participants). The indexed view and the `ar.Start >= @start and ar.Start <= @end` condition are still vital for perf though, in the schema I used.
Remus Rusanu
+1  A: 
where !(appt.StartDateTime >= end || appt.EndDateTime <= start)

This could be a perfectly good AND criteria.

where appt.StartDateTime < end && start < appt.EndDateTime

acl.ToIds().

Pull this out of the query, no sense in asking the database to perform the operation.

List<int> POGIDs = acl.ToIds();

join role in

You want to use roles as a filter. If you where, instead of joining, you don't have to Distinct later.


Try this, with and without the DataLoadOptions. If the query is good without the DataLoadOptions, there's another (more manual) way to load the related rows.

DataLoadOptions myOptions = new DataLoadOptions();
myOptions.LoadWith<Appointment>(appt => appt.Roles);
myOptions.LoadWith<Appointment>(appt => appt.Notes);
dc.LoadOptions = myOptions;


List<int> POGIDs = acl.ToIds();

IQueryable<Roles> roleQuery = dc.Roles
  .Where(r => POGIDs.Contains(r.PersonOrGroupId));

IQueryable<Appointment> visible =
  dc.Appointments
    .Where(appt => appt.StartDateTime < end && start < appt.EndDateTime)
    .Where(appt => appt.Roles.Any(r => roleQuery.Contains(r));

IQueryable<Appointment> q =
  visible.OrderBy(appt => appt.StartDateTime);

List<Appointment> rows = q.ToList();

Here's the "more manual" way of fetching related data. NOTE: This technique breaks when apptIds or POGIDs have more than ~2100 ints in them. There's a way to work around that too...

List<int> POGIDs = acl.ToIds();

List<Role> visibleRoles = dc.Roles
  .Where(r => POGIDs.Contains(r.PersonOrGroupId)
  .ToList()

List<int> apptIds = visibleRoles.Select(r => r.AppointmentId).ToList();

List<Appointment> appointments = dc.Appointments
  .Where(appt => appt.StartDateTime < end && start < appt.EndDate)
  .Where(appt => apptIds.Contains(appt.Id))
  .OrderBy(appt => appt.StartDateTime)
  .ToList();

ILookup<int, Roles> appointmentRoles = dc.Roles
  .Where(r => apptIds.Contains(r.AppointmentId))
  .ToLookup(r => r.AppointmentId);

ILookup<int, Notes> appointmentNotes = dc.AppointmentNotes
  .Where(n => apptIds.Contains(n.AppointmentId));
  .ToLookup(n => n.AppointmentId);

foreach(Appointment record in appointments)
{
  int key = record.AppointmentId;
  List<Roles> theRoles = appointmentRoles[key].ToList();
  List<Notes> theNotes = appointmentNotes[key].ToList();
}

This style highlights where indexes are needed:

Roles.PersonOrGroupId
Appointments.AppointmentId (should be PK already)
Roles.AppointmentId
Notes.AppointmentId
David B
Excellent answer, thank you.
Rob
join role in --- good idea, the list will always be short so makes a good in clause
Rob
acl.ToIds() --- I think linq-to-sql does this for me when it translates the queryable to the actual sql expression. Still I'll pull it out.
Rob
I can't limit the number of apptIds to 2100, which is why I was thinking I need to join against them. Did you have a different workaround?
Rob
Did you try the part "Try this"?
David B