views:

65

answers:

2

I'm having issues with a rather complex Linq query and I need some advice. This involves VB syntax so I need specific answers for that platform, as I have a lot of trouble translating the C# syntax to VB at times.

I have to join two main tables, and I need to filter the results by elements in an ASP.NET web form. These filters are created on the fly so I have to use a lot of where extensions to filter the query. I want to execute the query with as optimized SQL as possible.

I am first doing a simple join between TW_Sites and TW_Investigators. Then there are two sub-tables that are involved. TW_InvestigatorToArea and TW_InvestigatorToDisease. While most of the where clauses are working fine, I have found a performance issue that won't be an issue right now, but will be an issue as the table gets bigger.

The arrays DiseaseCategories and DiseaseAreas would be the results of a CheckBoxList result.

    Protected Sub LoadResults()
    ' Get Dictionary of Filters
    Dim FilterDictionary As OrderedDictionary = Session.Item("InvestigatorFilterDictionary")
    ' Initialize LinqToSql
    Dim LinqDbHandler As TrialWatchDC = New TrialWatchDC(WebConfigurationManager.ConnectionStrings("DataSourceName").ConnectionString)
    ' Create List of Categories to Filter By
    Dim DiseaseCategories() As Integer = {1, 2, 3, 4, 5, 6, 11, 22, 361, 77, 82, 99, 400}
    Dim CategorySubQuery = From ic In LinqDbHandler.TW_InvestigatorsToDiseases Where DiseaseCategories.Contains(ic.DiseaseCategoryID) Select ic.InvestigatorID Distinct
    ' Dim CategorySubArray = CategorySubQuery.ToArray()
    ' Create List of Areas to Filter By
    Dim AreaCategories() As Integer = {17, 1, 3, 5}
    Dim AreaSubQuery = From ic In LinqDbHandler.TW_InvestigatorsToAreas Where AreaCategories.Contains(ic.AreaID) Select ic.InvestigatorID Distinct
    Dim AreaSubArray = AreaSubQuery.ToArray()
    Dim dc As DbCommand
    Dim ThisQuery = From Site In LinqDbHandler.TW_Sites _
                    Join Investigator In LinqDbHandler.TW_Investigators On Site.TrialSiteID Equals Investigator.TrialSiteID _
                    Join SiteType In LinqDbHandler.TW_SiteTypes On Site.SiteTypeID Equals SiteType.SiteTypeID _
                    Order By Site.ResearchCenterName, Investigator.InvestigatorName
        Select New With {.TrialSiteID = Site.TrialSiteID, _
                         .InvestigatorID = Investigator.InvestigatorID, _
                         .ResearchCenterName = Site.ResearchCenterName, _
                         .SiteTypeID = SiteType.SiteTypeID, _
                         .TypeLabel = SiteType.TypeLabel, _
                         .CenterState = Site.CenterState, _
                         .CenterCountry = Site.CenterCountry, _
                         .ContactName = Site.ContactName, _
                         .ContactEMail = Site.ContactEMail, _
                         .ContactPhone = Site.ContactPhone, _
                         .IsRcppSubscriber = Site.IsRcppSubscriber, _
                         .InvestigatorName = Investigator.InvestigatorName, _
                         .IsPublicationSubscriber = Investigator.IsPublicationSubscriber, _
                         .HasPhase01 = Investigator.HasPhase01, _
                         .HasPhase02 = Investigator.HasPhase02, _
                         .HasPhase03 = Investigator.HasPhase03, _
                         .HasPhase04 = Investigator.HasPhase04, _
                         .AreaList = String.Join(",", (From ia In LinqDbHandler.TW_InvestigatorsToAreas Join a In LinqDbHandler.Disease_Areas On ia.AreaID Equals a.Area_Number Where ia.InvestigatorID = Investigator.InvestigatorID Order By a.Area_Name Select a.Area_Name Distinct).ToArray()), _
                         .CategoryList = String.Join(",", (From id In LinqDbHandler.TW_InvestigatorsToDiseases Join d In LinqDbHandler.Disease_Categories On id.DiseaseCategoryID Equals d.Category_Number Where id.InvestigatorID = Investigator.InvestigatorID Order By d.Category_Name Select d.Category_Name Distinct).ToArray())}
    If Not String.IsNullOrEmpty(FilterDictionary.Item("CountryFilter")) Then
        ThisQuery = ThisQuery.Where(Function(s) s.CenterCountry = FilterDictionary.Item("CountryFilter").ToString())
    End If
    If Not String.IsNullOrEmpty(FilterDictionary.Item("SiteType")) Then
        ThisQuery = ThisQuery.Where(Function(s) s.SiteTypeID = Convert.ToInt32(FilterDictionary.Item("SiteType")))
    End If
    dc = LinqDbHandler.GetCommand(ThisQuery)
    If Not String.IsNullOrEmpty(FilterDictionary.Item("StateFilter")) Then
        ThisQuery = ThisQuery.Where(Function(s) s.CenterState = FilterDictionary.Item("StateFilter").ToString())
    End If
    dc = LinqDbHandler.GetCommand(ThisQuery)
    ThisQuery = ThisQuery.Where(Function(i) CategorySubArray.Contains(i.InvestigatorID))
    ThisQuery = ThisQuery.Where(Function(i) AreaSubArray.Contains(i.InvestigatorID))
    dc = LinqDbHandler.GetCommand(ThisQuery)
    Trace.Warn("Command", dc.CommandText)
    For Each dcp As SqlParameter In dc.Parameters
        Trace.Warn(dcp.ParameterName.ToString(), dcp.Value.ToString())
    Next
    Dim ThisLinqResult = ThisQuery
    InvestigatorResultGrid.DataSource = ThisLinqResult
    InvestigatorResultGrid.DataBind()

End Sub

The big problem is, when you look at the code, basically I am first converting the filtered subqueries into an array and then passing it into the SQL code. The result ends up making an SQL Query with a lot of parameters, as seen below.

SELECT [t0].[TrialSiteID], [t1].[InvestigatorID], [t0].[ResearchCenterName], [t2].[SiteTypeID], [t2].[TypeLabel], [t0].[CenterState], [t0].[CenterCountry], [t0].[ContactName],
    [t0].[ContactEMail], [t0].[ContactPhone], [t0].[IsRcppSubscriber], [t1].[InvestigatorName], [t1].[IsPublicationSubscriber], [t1].[HasPhase01], [t1].[HasPhase02], [t1].[HasPhase03],
    [t1].[HasPhase04]
    FROM [dbo].[TW_Sites] AS [t0]
    INNER JOIN [dbo].[TW_Investigators] AS [t1] ON [t0].[TrialSiteID] = [t1].[TrialSiteID]
    INNER JOIN [dbo].[TW_SiteTypes] AS [t2] ON [t0].[SiteTypeID] = ([t2].[SiteTypeID])
    WHERE ([t1].[InvestigatorID] IN (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9, @p10, @p11, @p12, @p13, @p14, @p15, @p16, @p17, @p18, @p19, @p20, @p21, @p22, @p23, @p24, @p25, @p26, @p27, 
    @p28, @p29, @p30, @p31, @p32, @p33, @p34, @p35, @p36, @p37, @p38, @p39, @p40, @p41, @p42, @p43, @p44, @p45, @p46, @p47, @p48, @p49, @p50, @p51, @p52, @p53, @p54, @p55, @p56, @p57, @p58, 
    @p59, @p60, @p61, @p62, @p63, @p64, @p65, @p66, @p67, @p68, @p69, @p70, @p71, @p72, @p73, @p74, @p75, @p76, @p77, @p78, @p79, @p80, @p81, @p82, @p83, @p84, @p85, @p86, @p87, @p88, @p89, 
    @p90, @p91, @p92, @p93, @p94, @p95, @p96, @p97, @p98, @p99, @p100, @p101, @p102, @p103, @p104, @p105, @p106, @p107, @p108, @p109, @p110, @p111, @p112, @p113, @p114, @p115)) AND 
    ([t1].[InvestigatorID] IN (@p116, @p117, @p118, @p119, @p120, @p121, @p122, @p123, @p124, @p125, @p126, @p127, @p128, @p129, @p130, @p131, @p132, @p133, @p134, @p135, @p136, @p137, @p138, 
    @p139, @p140, @p141, @p142, @p143, @p144, @p145, @p146, @p147, @p148, @p149, @p150, @p151, @p152, @p153, @p154, @p155, @p156, @p157, @p158, @p159, @p160, @p161, @p162, @p163, @p164, @p165, 
    @p166, @p167, @p168, @p169, @p170, @p171, @p172, @p173, @p174, @p175, @p176, @p177, @p178, @p179, @p180, @p181, @p182, @p183, @p184, @p185, @p186, @p187, @p188, @p189, @p190, @p191, @p192, 
    @p193, @p194, @p195, @p196, @p197, @p198, @p199, @p200, @p201, @p202, @p203, @p204, @p205))
    ORDER BY [t0].[ResearchCenterName], [t1].[InvestigatorName] 

This is a lot of parameters and will just get worse. Basically, instead of having a small IN clause with the conditions, I have a much larger IN clause with the investigator ids.

So, what I am trying to do is figure out how to, instead of converting the Area and Category queries into an array and then appending them to the third query, to get the queries to include the sub-tables and directly search for the matching ids of the areas and categories. I need to be able to use the predicate syntax since areas and categories are two sub-tables, and sometimes both or neither will be included. I know it has to do with either the .Any(), .Join(), or .Where() predicates, I just don't know how to get it to work.

Basically, I'm trying to change the SQL to make it look more like this.

WHERE ([t1].[InvestigatorID] IN (SELECT InvestigatorID FROM TW_InvestigatorsToAreas
WHERE DiseaseCategoryID IN (@p101, @p102, @p103))) 

Any help or guidance would be appreciated.

A: 

Is this LINQ to SQL or EF?

Most ORMs will generate dynamic SQL with each ID for the IN statement as a parameter. Some smarter ones will create a temp table and join against that instead or use a nested subquery (or if you're really creative you might extend an ORM to do this).

I know DataObjects .NET does the temp table thing and LLBLGen can use nested subqueries for joins (aka prefetch pathes) and I'm sure there are at least a couple of others that do too.

One thing to note: Your mileage may vary. One big plus with a temp table is that you get around the 2400 parameter limit in SQL Server (though I'm not sure if that's an issue for you...). However, certain, maybe 1 in 20 queries will actually perform much much slower joining against a temp table (even an indexed one) vs. just passing in each ID as a parameter. Still, in general though, you'll have much better performance because the execution plan doesn't need to get re-compiled for each query.

JeffN825
It's Linq to SQL.
JRT
In that case there are other problems too with making N queries for a single complex LINQ query where N is the number of rows returned...so I'd suggest switching to another provider if that's an option, or consider simplifying you're query
JeffN825
Sadly, that's not very helpful, as I can't switch providers as I need to use what I have. I do think I need to do something earlier in the query, so I am hoping for more suggestions.
JRT
A: 

It looks like it just came down to a syntax statement. I guess you just have to invoke the subquery from the LinqToSql object itself.

If DCHash.Count > 0 Then
    ThisQuery = ThisQuery.Where(Function(i) (From ic In LinqDbHandler.TW_InvestigatorsToDiseases Where DiseaseCategories.Contains(ic.DiseaseCategoryID) Select ic.InvestigatorID).Contains(i.InvestigatorID))
End If
If AreaHash.Count > 0 Then
    ThisQuery = ThisQuery.Where(Function(i) (From ia In LinqDbHandler.TW_InvestigatorsToAreas Where DiseaseAreas.Contains(ia.AreaID) Select ia.InvestigatorID).Contains(i.InvestigatorID))
End If
JRT