I'm having issues with a rather complex Linq query and I need some advice. This involves VB syntax so I need specific answers for that platform, as I have a lot of trouble translating the C# syntax to VB at times.
I have to join two main tables, and I need to filter the results by elements in an ASP.NET web form. These filters are created on the fly so I have to use a lot of where extensions to filter the query. I want to execute the query with as optimized SQL as possible.
I am first doing a simple join between TW_Sites and TW_Investigators. Then there are two sub-tables that are involved. TW_InvestigatorToArea and TW_InvestigatorToDisease. While most of the where clauses are working fine, I have found a performance issue that won't be an issue right now, but will be an issue as the table gets bigger.
The arrays DiseaseCategories and DiseaseAreas would be the results of a CheckBoxList result.
Protected Sub LoadResults()
' Get Dictionary of Filters
Dim FilterDictionary As OrderedDictionary = Session.Item("InvestigatorFilterDictionary")
' Initialize LinqToSql
Dim LinqDbHandler As TrialWatchDC = New TrialWatchDC(WebConfigurationManager.ConnectionStrings("DataSourceName").ConnectionString)
' Create List of Categories to Filter By
Dim DiseaseCategories() As Integer = {1, 2, 3, 4, 5, 6, 11, 22, 361, 77, 82, 99, 400}
Dim CategorySubQuery = From ic In LinqDbHandler.TW_InvestigatorsToDiseases Where DiseaseCategories.Contains(ic.DiseaseCategoryID) Select ic.InvestigatorID Distinct
' Dim CategorySubArray = CategorySubQuery.ToArray()
' Create List of Areas to Filter By
Dim AreaCategories() As Integer = {17, 1, 3, 5}
Dim AreaSubQuery = From ic In LinqDbHandler.TW_InvestigatorsToAreas Where AreaCategories.Contains(ic.AreaID) Select ic.InvestigatorID Distinct
Dim AreaSubArray = AreaSubQuery.ToArray()
Dim dc As DbCommand
Dim ThisQuery = From Site In LinqDbHandler.TW_Sites _
Join Investigator In LinqDbHandler.TW_Investigators On Site.TrialSiteID Equals Investigator.TrialSiteID _
Join SiteType In LinqDbHandler.TW_SiteTypes On Site.SiteTypeID Equals SiteType.SiteTypeID _
Order By Site.ResearchCenterName, Investigator.InvestigatorName
Select New With {.TrialSiteID = Site.TrialSiteID, _
.InvestigatorID = Investigator.InvestigatorID, _
.ResearchCenterName = Site.ResearchCenterName, _
.SiteTypeID = SiteType.SiteTypeID, _
.TypeLabel = SiteType.TypeLabel, _
.CenterState = Site.CenterState, _
.CenterCountry = Site.CenterCountry, _
.ContactName = Site.ContactName, _
.ContactEMail = Site.ContactEMail, _
.ContactPhone = Site.ContactPhone, _
.IsRcppSubscriber = Site.IsRcppSubscriber, _
.InvestigatorName = Investigator.InvestigatorName, _
.IsPublicationSubscriber = Investigator.IsPublicationSubscriber, _
.HasPhase01 = Investigator.HasPhase01, _
.HasPhase02 = Investigator.HasPhase02, _
.HasPhase03 = Investigator.HasPhase03, _
.HasPhase04 = Investigator.HasPhase04, _
.AreaList = String.Join(",", (From ia In LinqDbHandler.TW_InvestigatorsToAreas Join a In LinqDbHandler.Disease_Areas On ia.AreaID Equals a.Area_Number Where ia.InvestigatorID = Investigator.InvestigatorID Order By a.Area_Name Select a.Area_Name Distinct).ToArray()), _
.CategoryList = String.Join(",", (From id In LinqDbHandler.TW_InvestigatorsToDiseases Join d In LinqDbHandler.Disease_Categories On id.DiseaseCategoryID Equals d.Category_Number Where id.InvestigatorID = Investigator.InvestigatorID Order By d.Category_Name Select d.Category_Name Distinct).ToArray())}
If Not String.IsNullOrEmpty(FilterDictionary.Item("CountryFilter")) Then
ThisQuery = ThisQuery.Where(Function(s) s.CenterCountry = FilterDictionary.Item("CountryFilter").ToString())
End If
If Not String.IsNullOrEmpty(FilterDictionary.Item("SiteType")) Then
ThisQuery = ThisQuery.Where(Function(s) s.SiteTypeID = Convert.ToInt32(FilterDictionary.Item("SiteType")))
End If
dc = LinqDbHandler.GetCommand(ThisQuery)
If Not String.IsNullOrEmpty(FilterDictionary.Item("StateFilter")) Then
ThisQuery = ThisQuery.Where(Function(s) s.CenterState = FilterDictionary.Item("StateFilter").ToString())
End If
dc = LinqDbHandler.GetCommand(ThisQuery)
ThisQuery = ThisQuery.Where(Function(i) CategorySubArray.Contains(i.InvestigatorID))
ThisQuery = ThisQuery.Where(Function(i) AreaSubArray.Contains(i.InvestigatorID))
dc = LinqDbHandler.GetCommand(ThisQuery)
Trace.Warn("Command", dc.CommandText)
For Each dcp As SqlParameter In dc.Parameters
Trace.Warn(dcp.ParameterName.ToString(), dcp.Value.ToString())
Next
Dim ThisLinqResult = ThisQuery
InvestigatorResultGrid.DataSource = ThisLinqResult
InvestigatorResultGrid.DataBind()
End Sub
The big problem is, when you look at the code, basically I am first converting the filtered subqueries into an array and then passing it into the SQL code. The result ends up making an SQL Query with a lot of parameters, as seen below.
SELECT [t0].[TrialSiteID], [t1].[InvestigatorID], [t0].[ResearchCenterName], [t2].[SiteTypeID], [t2].[TypeLabel], [t0].[CenterState], [t0].[CenterCountry], [t0].[ContactName],
[t0].[ContactEMail], [t0].[ContactPhone], [t0].[IsRcppSubscriber], [t1].[InvestigatorName], [t1].[IsPublicationSubscriber], [t1].[HasPhase01], [t1].[HasPhase02], [t1].[HasPhase03],
[t1].[HasPhase04]
FROM [dbo].[TW_Sites] AS [t0]
INNER JOIN [dbo].[TW_Investigators] AS [t1] ON [t0].[TrialSiteID] = [t1].[TrialSiteID]
INNER JOIN [dbo].[TW_SiteTypes] AS [t2] ON [t0].[SiteTypeID] = ([t2].[SiteTypeID])
WHERE ([t1].[InvestigatorID] IN (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9, @p10, @p11, @p12, @p13, @p14, @p15, @p16, @p17, @p18, @p19, @p20, @p21, @p22, @p23, @p24, @p25, @p26, @p27,
@p28, @p29, @p30, @p31, @p32, @p33, @p34, @p35, @p36, @p37, @p38, @p39, @p40, @p41, @p42, @p43, @p44, @p45, @p46, @p47, @p48, @p49, @p50, @p51, @p52, @p53, @p54, @p55, @p56, @p57, @p58,
@p59, @p60, @p61, @p62, @p63, @p64, @p65, @p66, @p67, @p68, @p69, @p70, @p71, @p72, @p73, @p74, @p75, @p76, @p77, @p78, @p79, @p80, @p81, @p82, @p83, @p84, @p85, @p86, @p87, @p88, @p89,
@p90, @p91, @p92, @p93, @p94, @p95, @p96, @p97, @p98, @p99, @p100, @p101, @p102, @p103, @p104, @p105, @p106, @p107, @p108, @p109, @p110, @p111, @p112, @p113, @p114, @p115)) AND
([t1].[InvestigatorID] IN (@p116, @p117, @p118, @p119, @p120, @p121, @p122, @p123, @p124, @p125, @p126, @p127, @p128, @p129, @p130, @p131, @p132, @p133, @p134, @p135, @p136, @p137, @p138,
@p139, @p140, @p141, @p142, @p143, @p144, @p145, @p146, @p147, @p148, @p149, @p150, @p151, @p152, @p153, @p154, @p155, @p156, @p157, @p158, @p159, @p160, @p161, @p162, @p163, @p164, @p165,
@p166, @p167, @p168, @p169, @p170, @p171, @p172, @p173, @p174, @p175, @p176, @p177, @p178, @p179, @p180, @p181, @p182, @p183, @p184, @p185, @p186, @p187, @p188, @p189, @p190, @p191, @p192,
@p193, @p194, @p195, @p196, @p197, @p198, @p199, @p200, @p201, @p202, @p203, @p204, @p205))
ORDER BY [t0].[ResearchCenterName], [t1].[InvestigatorName]
This is a lot of parameters and will just get worse. Basically, instead of having a small IN clause with the conditions, I have a much larger IN clause with the investigator ids.
So, what I am trying to do is figure out how to, instead of converting the Area and Category queries into an array and then appending them to the third query, to get the queries to include the sub-tables and directly search for the matching ids of the areas and categories. I need to be able to use the predicate syntax since areas and categories are two sub-tables, and sometimes both or neither will be included. I know it has to do with either the .Any(), .Join(), or .Where() predicates, I just don't know how to get it to work.
Basically, I'm trying to change the SQL to make it look more like this.
WHERE ([t1].[InvestigatorID] IN (SELECT InvestigatorID FROM TW_InvestigatorsToAreas
WHERE DiseaseCategoryID IN (@p101, @p102, @p103)))
Any help or guidance would be appreciated.