views:

335

answers:

3

I'd like a query that at its simplest will join 2 tables together with no other express relationship other than each row in what I will call the "pool" table will match to precisely 1 row in my other table. I don't care which row, I just want every row in my primary table to get a single row from the "pool" table and know that each row from the pool will only be used once.

I was thinking something like ROW_NUMBER() OVER() could be used to match on an arbitrary row number which would be fine but I think requires at least 2 inner rowset providers; I thought there would be something simpler.

To put the problem a bit more succinctly, you have a table of ids and you want to join it to a table of records to assign the ids. You can only use each id once. What query structure or join can you use to return a rowset of all of the records each assigned to an ID. It does not matter what record gets what ID, only that each ID can only be used once.

Background

For those of you interested in the background, what I have is a logical entity which is made up of a header row in a header table which is generic to lots of objects which gives us its ID and then a record in an entity specific table. I'm using a query like the following to pregenerate a bunch of IDs in the header table:

declare @idsTable TABLE(ID INT);

INSERT INTO Header (HeaderType)
OUTPUT INSERTED.id INTO @idsTable
SELECT 4 as HeaderType
  FROM Company c WHERE c.CompanyType = 12;

At this point I have a bunch of header rows and the ids (IDENTITY) that were assigned to the rows. Now, I want to use a similar INSERT into the object specific table but I need to match an ID from the @idsTable to only 1 row in my select query (and hence my insert query). Something like:

INSERT INTO Specific (HeaderiD, Value1, ...)
SELECT * 
  FROM @idsTable 
  JOIN RecordsToWrite r2r ON ???
A: 

maybe im misunderstanding, but cant you just do:

SELECT * FROM @idsTable,RecordsToWrite

OR do a Cross Join

NickAtuShip
That would be EVERY combination between the two; you can only use a row from idstable ONCE.
Peter Oehlert
+1  A: 

If I correctly understand your data, then number of rows in the RecordsToWrite table and number of new rows that come from the query ... FROM Company ... should be the same. In this case:

  • add a simple UNIQUEIDENTIFIER column UID to your RecordsToWrite table with DEFAULT value CONSTRAINT (NEWID()), so that it is automatically generated.
  • change your @idsTable table and statement to this

-

declare @idsTable TABLE(ID INT, UID UNIQUEIDENTIFIER);
INSERT  INTO Header (HeaderType)
OUTPUT  INSERTED.id, r.UID INTO @idsTable
SELECT  4 AS HeaderType, r.UID
FROM    RecordsToWrite r --// maybe other filters to get only number or records as in:
--//FROM Company c WHERE c.CompanyType = 12;

Then join on this UID column in your second query.

edit:new The following code works fine on my sql-2008, which means that I can use OUTPUT with INSERT as well. So can you, just try:

CREATE TABLE TestOutput (ID INT IDENTITY, TAG VARCHAR(10));

DECLARE @idsTable TABLE(ID INT, UID UNIQUEIDENTIFIER);

INSERT  INTO TestOutput (TAG)
OUTPUT  INSERTED.id, NEWID() INTO @idsTable
SELECT  tag
FROM   (    SELECT 'test-1' as tag
UNION ALL   SELECT 'test-2'
) x

SELECT * FROM @idsTable
SELECT * FROM TestOutput

DROP TABLE TestOutput;

and I get following results:

ID          UID
----------- ------------------------------------
1           422FF4F0-9CFB-4F67-94C8-2D3B225E39B0
2           0BD2B2D2-1319-4981-9E26-C09FE844359C

ID          TAG
----------- ----------
1           test-1
2           test-2
van
I use similar approach to store objects in relational database, where the base classes are stored in one table and child-specific ones in specific tables. But the difference is that I actually keep the UID column on the parent level, so that each of my objects also has a global UID. I do not keep it on the child table though, because all the FKs and table links are done on INT column rather then on UID. Having UID give me more flexibility because I can also specify it from the client code and still be able to link to the object from further queries.
van
This query won't work as written because you're inserting only the (HeaderType) column, and you're selecting 2 columns from your select query. Less pedantically, I have never been able to get the OUTPUT statement to accept anything that isn't sourced from the INSERTED table. When I try, I get a 'The multi-part identifier "r.UID" could not be bound.' error. In this case, it's not acceptable to modify the HEADER table to source the Guid there.
Peter Oehlert
Just to be a little clearer, this is the simple approach I thought might exist but couldn't make work. I'd love if this could work.
Peter Oehlert
And upon closer inspection of the OUTPUT clause documentation (http://technet.microsoft.com/en-us/library/ms177564.aspx) referencing aliases in the from clause ONLY works in DELETE, UPDATE and MERGE OUTPUT clauses; not in an INSERT/SELECT FROM.
Peter Oehlert
@Peter: I updated the answer for code readability - I have no problem running INSERT with OUTPUT, and I did not find any mentioning in the link you provided that it cannot be used. Could you please specify exact quote?
van
+1  A: 

Why did you give up ROW_NUMBER? Example at Adventure Works

SELECT 
    * 
FROM 
(
    SELECT 
        TOP 10 ROW_NUMBER() OVER(ORDER BY EmployeeId) AS join_id,* 
    FROM 
        HumanResources.Employee
) t1
INNER JOIN
(
    SELECT 
        TOP 10 ROW_NUMBER() OVER(ORDER BY DepartmentId) AS join_id,* 
    FROM 
        HumanResources.Department
) t2 ON t1.join_id = t2.join_id
Lukasz Lysik
I didn't give it up, I just thought that it wasn't particularly readable and that there would be something simpler. This is precisely what I meant by the row_number solution by the way.
Peter Oehlert