I am working with a large amount of data: 6 million rows. I need the query to run as fast as possible, but am at a loss for further optimization. I already removed 3 subqueries and moved it from 11+ hours to just 35 minutes on a modest dataset of 100k rows. See below!
declare @UserId uniqueidentifier;
set @UserId = '936DA01F-9ABD-4d9d-80C7-02AF85C822A8';
select
temp.Address_Line1,
temp.Cell_Phone_Number,
temp.City,
temp.CPM_delt_acd,
temp.CPM_delt_date,
temp.Customer_Id,
temp.Customer_Type,
temp.Date_Birth,
temp.Email_Business,
temp.Email_Home,
temp.First_Name,
temp.Geo,
temp.Home_Phone_Number,
temp.Last_Name,
temp.Link_Customer_Id,
temp.Middle_Name,
temp.Naics_Code,
temp.Office_Phone_Number,
temp.St,
temp.Suffix,
temp.Tin,
temp.TIN_Indicator,
temp.Zip_Code,
crm_c.contactid as CrmRecordId,
crm_c.ownerid as OldOwnerId,
crm_c.ext_profiletype as old_profileType,
coalesce(crm_fim.ownerid, @UserId) as OwnerId,
2 as profileType,
case
when
(temp.Tin = crm_c.ext_retail_prime_taxid collate database_default
and temp.Last_Name = crm_c.lastname collate database_default)
then
('Tin/LastName: '+temp.Tin + '/' + temp.Last_Name)
when
(temp.Customer_ID = crm_c.ext_customerid collate database_default)
then
('Customer_ID: '+temp.Customer_ID)
else
('New Customer: '+temp.Customer_ID)
end as FriendlyName,
case
when
(temp.Customer_ID = crm_c.ext_customerid collate database_default)
then
0
else
1
end as ForceFieldLock
from DailyProfile_Current temp
left join crm_contact crm_c
on (temp.Customer_ID = crm_c.ext_customerid collate database_default
or (temp.Tin = crm_c.ext_retail_prime_taxid collate database_default
and temp.Last_Name = crm_c.lastname collate database_default))
and 0 = crm_c.deletionstatecode and 0 = crm_c.statecode
left outer join crm_ext_ImportMapping crm_fim
on temp.Geo = crm_fim.ext_geocode collate database_default
and 0 = crm_fim.deletionstatecode and 0 = crm_fim.statecode
Where crm_contact is a synonym that points to a view in another database. That view pulls data from a contact table and a contactextension table. I need data from both. I could probably separate this into two joins, if necessary. In general, columns that begin with "ext_" are from the extension part of the crm_contact view.
When I run this against 100k rows in the DailyProfile_Current table, it takes about 35 minutes. That table is a bunch of nvarchar(200) columns that had a flatfile dumped into it. It sucks, but it's what I inherited. I wonder if using real datatypes would help, but I'd like possible solutions that don't involve that as well.
If the DailyProfile_Current table is full of things that don't match the join conditions, this runs incredibly fast. If the table is full of things that do match the join conditions, it's incredibly slow.
There are indexes on Customer_ID and Geo from the temp table. There are also assorted indexes on the crm_contact tables. I don't know how much an index can help on an nvarchar(200) column, though.
In case it matters, I'm using Sql Server 2005.
Any ideas are appreciated.