tags:

views:

817

answers:

4

Hello -

I am trying to create an ANI lookup table from 2 separate tables, one a table of stores and the other a list of contacts for those stores.

I am using MS SQL Server 2005, which, unfortunately, does not support the MERGE INTO syntax...

The good stuff: The ANI lookup table has 2 significant columns, StoreID and PhoneNumber. The PhoneNumber column is the (unique) Primary key, as there must be only one StoreID returned for a given PhoneNumber.

Store_Info significant columns:

StoreID  
StorePhone  
AltPhone

There is one record for each StoreID, with possible duplicate phone numbers between stores. And yes, AltPhone could be the same as StorePhone...

Store_Contacts significant columns:

StoreID  
Phone

There are multiple entries for StoreID, and possible duplicate phone numbers for one store or across multiple stores.

Sample store data

StoreID   Parent ID StorePhone       AltPhone  
1         0         402-123-2300     402-123-2345  
2         0         202-321-7800     202-321-7890  
3         1         202-302-5600     202-302-5600

Sample contacts data:

StoreID   Title    Name    Phone  
1         Mgr      Bob     402-123-2345  
1         IT       Pat     402-123-2346  
1         Reg Mgr  Dave    402-321-3213  
2         Mgr      Ann     202-231-7890  
2         IT       Mary    202-231-7893  
2         A/R      Ann     202-231-7890  
2         Reg Mgr  Dave    402-321-3213  
3         Mgr      Bob     402-123-2345  
3         AsstMgr  Pete    402-123-2356

I want to insert phone numbers in the following priority:

  1. Main/single store StorePhone
  2. Main/single store AltPhone
  3. Branch store StorePhone
  4. Branch store AltPhone
  5. Main/single store contact Phone
  6. Branch store contact phone
    • If a phone number already exists in the destination table, do not add it...

So the resulting dataset should be:

StoreID  Phone  
1        402-123-2300  (first pass)  
2        202-321-7800  
1        402-123-2345  (2nd pass)  
2        202-321-7890  
3        202-302-5600  (3rd & 4th pass - only add once)  
1        402-123-2346  (5th pass - skip dup)  
1        402-321-3213  
2        202-231-7893  (do not add dups)  
3        402-123-2356  (final pass - skip dup)

My approach to prioritizing which phone number of the duplicates to choose is to make multiple queries based on other criteria (main store vs branch, for example), inserting the first entry found into the ANI lookup table and skipping subsequent duplicates.

How do I do this without using RBAR? I have tried the following with no luck - actually, it works OK until I get to the Store_Contacts table, where there can be multiple identical phone numbers for a given store:

INSERT INTO dbo.Store_PhoneNumbers (StoreID, PhoneNumber)
    SELECT DISTINCT StoreID, dbo.GetPhoneNumber10(StorePhone)
    FROM dbo.Store_Info
    WHERE dbo.IsAniNumber(dbo.GetPhoneNumber10(StorePhone)) = 1
     AND ParentID = 0
     AND NOT EXISTS (SELECT * FROM dbo.Store_PhoneNumbers WHERE PhoneNumber = dbo.GetPhonenumber10(StorePhone));

... repeat for AltPhone, then StorePhone where ParentID <> 0 then AltPhone w/ ParentID <> 0

So far so good, then here's where it falls apart:

INSERT INTO dbo.Store_PhoneNumbers (StoreID, PhoneNumber)
    SELECT DISTINCT sc.StoreID, dbo.GetPhoneNumber10(sc.Phone)
    FROM Store_Contacts sc
      INNER JOIN
     Store_Info si ON sc.StoreID = si.StoreID
    WHERE (dbo.IsAniNumber(dbo.GetPhoneNumber10(sc.Phone)) = 1)
     AND (si.ParentID = 0)
     AND NOT EXISTS (SELECT * FROM dbo.Store_PhoneNumbers WHERE PhoneNumber = dbo.GetPhonenumber10(sc.Phone));

... and repeat for ParentID <> 0

That's where I get the duplicate entries and the insert fails.

Thanks for any help you can give me, I'm about to give up and use a cursor, just to get it done...
Dave

A: 

Isn't it simply a query based on:

SELECT StorePhone AS Phone -- , ...other columns...
    FROM StoreInfo
UNION
SELECT AltPhone AS Phone   -- , ...other columns...
    FROM StoreInfo
UNION
SELECT Phone               -- , ...other columns...
    FROM Store_Contacts

If AltPhone can be null, you could add a WHERE clause to eliminate the nulls. I'm not clear what you mean by ANI or RBAR. You can obviously add extra columns to the different result sets as long as the net is the same. UNION eliminates duplicate rows automatically.


If a phone number already exists in the destination table, do not add it...

Ah, then you need the MERGE statement. You would MERGE into your target table using a minor variation of the query above as the source of data.

BNF for the statement from the SQL 2003 standard (section 14.9):

<merge statement> ::=
     MERGE INTO <target table> [ [ AS ] <merge correlation name> ]
     USING <table reference> ON <search condition>
     <merge operation specification>

<merge correlation name> ::= <correlation name>

<merge operation specification> ::= <merge when clause> ...

<merge when clause> ::=
    <merge when matched clause> |
    <merge when not matched clause>

<merge when matched clause> ::=
    WHEN MATCHED THEN <merge update specification>

<merge when not matched clause> ::=
    WHEN NOT MATCHED THEN <merge insert specification>

<merge update specification> ::= UPDATE SET <set clause list>

<merge insert specification>  ::=
     INSERT [ <left paren> <insert column list> <right paren> ]
     [ <override clause> ] VALUES <merge insert value list>

<merge insert value list> ::=
     <left paren> <merge insert value element>
     [ { <comma> <merge insert value element> }... ] <right paren>

<merge insert value element> ::=
     <value expression> |
     <contextually typed value specification>

You can also find descriptions of this statement in the relevant product manuals, which often offers more options. In your case, you'd probably omit the WHEN MATCHED clause, using only the WHEN NOT MATCHED clause.


Additional observation:

MS SQL Server 2005 does not support MERGE.

Not the only DBMS I know of with that limitation.

You are probably, then, faced with creating a temporary table and loading it with the data from the UNION-select statement.

You can then do inserts into your main data table from the temp table based on the non-existence of the corresponding row in the main data table. At least, some DBMS allow you to do that. I'm not an MS SQL Server expert, so I don't know whether the fine print of the UPDATE statement prevents you from selecting from the table being updated in sub-queries of the UPDATE statement. That can be a real nuisance if you are so limited.

Another option is to unload the table to plain text, and unload the UNION-select data to plain text, and then use file-system (command line) options to deal with it. How feasible that is depends on the volume of data, which I've forgotten. Perl can be useful here, reading the main table into a hash and then selectively updating it from the UNION-select data, and finally rewriting the data to a load file. Then you 'just' start the transaction, drop all the old data, load all the new data, and cross-fingers and commit. The downside of this is that changes made between unload and load are lost. So be careful if you decide to use this technique. You might want to do the unload in the transaction, and modify the data, and then delete and reload -- all in the same single transaction. It will need to be a single button-push (return-key) to do the whole job.

Jonathan Leffler
Doesn't work. When I try to insert the result set into the destination table, it fails due to duplicate phone numbers. I will try to explain better what I am after in my responses to the comments above.
DaveN59
Great! But, T-SQL (MS SQL Server 2005) does not support MERGE INTO... I forgot to mention the platform. Other than that one small little detail, it looks like it would do the trick. Support for the MERGE INTO syntax will be in the new SQL Server 2008, adn we're not there yet.So close!
DaveN59
A: 

FYI,

ANI = http://en.wikipedia.org/wiki/Automatic_Number_Identification

RBAR= row by agonizing row

qux
Thanks for that. I forget there are people who haven't heard every TLA and industry buzzword that I have...
DaveN59
+1  A: 
SELECT DISTINCT sc.StoreID, dbo.GetPhoneNumber10(sc.Phone)

DISTINCT is wrong. It will allow 2 stores to share the same number. Use GROUP BY to ensure that the second column is unique.

INSERT INTO dbo.Store_PhoneNumbers (StoreID, PhoneNumber)
SELECT MIN(StoreID), PhoneNumber
FROM
(
  SELECT sc.StoreID as StoreID, dbo.GetPhoneNumber10(sc.Phone) as PhoneNumber
  FROM Store_Contacts sc
      INNER JOIN
      Store_Info si ON sc.StoreID = si.StoreID
  WHERE (dbo.IsAniNumber(dbo.GetPhoneNumber10(sc.Phone)) = 1)
      AND (si.ParentID = 0)
      AND NOT EXISTS (SELECT * FROM dbo.Store_PhoneNumbers WHERE PhoneNumber = dbo.GetPhonenumber10(sc.Phone))
) sub
GROUP BY PhoneNumber

The reason you could get away with distinct in the other queries, was that you were working with a single StoreID in them. This query returns multiple StoreIDs.

David B
That does it! Perfect. I had tried the MIN() function before, but had the syntax wrong.
DaveN59
Yeah, SQL is really inflexible about the order of clauses, which often leads me to write subqueries so I can group on a selected column (for example).
David B
A: 

I see that there's an answer already selected, but I'd be remiss if I didn't point out a simpler, and more general solution.

Instead of making the priority implicit in your order of inserting, make it explicit.

Your question is basically, "I have several sources of a datum, and I know a priority for each one. For each key, I wish to select the single datum with the highest priority."

First select all possible datums (storeid) for your key (phone):

create table prioritized_phone( phone char(12), storeid int, priority int);

insert into prioritized_phone(phone, storeid, priority) 
select storephone, storeid, 1  from store_info
union
select altphone, storeid, 2 from store_info

I don't know how you select a branch store's phone, but there's some query that gets that, probably by using parentid in storeinfo, like this:

union
select b.storephone, a.storeid, 3
from store_info a join storeinfo b on (a.parentid = b,storeid)
select b.altphone, a.storeid, 4
from store_info a join storeinfo b on (a.parentid = b,storeid)

And then the contact phones:

union 
select distinct phone, storeid, 5 from storecontacts;

Once you've done that, for each phone, delete any but the lowest (best) priority:

delete from prioritized_phone a where a.priority > 
(select min( priority) from prioritized_phone b where b.phone = a.phone);

Now for each phone we only have its minimum priority row(s). That still may not be unique by store, so we arbitraily select the lowest storeid for the phone with:

delete from prioritized_phone a where a.store_id > 
(select min( store_id ) from prioritized_phone b where b.phone = a.phone);

We now have one storeid per phone, but we may still have dupes:

create table phone_lookup( phone char(12), storeid int);

insert into phone_lookup(phone, storeid)
select distinct phone, storeid 
from prioritized_phone;

Why is this solution easier? Because it makes the priority, which was implicit in your solution (implied by order of operations) into an explicit value we can select on.

tpdi