views:

392

answers:

2

I have a SQL query with three SELECT statements. A picture of the data tables generated by these three select statements is located at www.britestudent.com/pub/1.png. Each of the three data tables have identical columns. I want to combine these three tables into one table such that:

(1) All rows in top table (Table1) are always included.

(2) Rows in the middle table (Table2) are included only when the values in column1 (UserName) and column4 (CourseName) do not match with any row from Table1. Both columns need to match for the row in Table2 to not be included.

(3) Rows in the bottom table (Table3) are included only when the value in column4 (CourseName) is not already in any row of the results from combining Table1 and Table2.

I have had success in implementing (1) and (2) with an SQL query like this:

SELECT DISTINCT 
   UserName AS UserName, 
   MAX(AmountUsed) AS AmountUsed, 
   MAX(AnsweredCorrectly) AS AnsweredCorrectly, 
   CourseName, 
   MAX(course_code) AS course_code, 
   MAX(NoOfQuestionsInCourse) AS NoOfQuestionsInCourse, 
   MAX(NoOfQuestionSetsInCourse) AS NoOfQuestionSetsInCourse
FROM 
   ( "SELECT statement 1"  UNION  "SELECT statement 2" ) dt_derivedTable_1 
GROUP BY CourseName, UserName

Where "SELECT statement 1" is the query that generates Table1 and "SELECT statement 2" is the query that generates Table2. A picture of the data table generated by this query is located at www.britestudent.com/pub/2.png. I can get away with using the MAX() function because values in the AmountUsed and AnsweredCorrectly columns in Table1 will always be larger than those in Table2 (and they are identical in the last three columns of both tables).

What I fail at is implementing (3). Any suggestions on how to do this will be appreciated. It is tricky because the UserName values in Table3 are null, and because the CourseName values in the combined Table1 and Table2 results are not unique (but they are unique in Table3).

After implementing (3), the final table should look like the table in picture 2.png with the addition of the last row from Table3 (the row with the CourseName value starting with "4. Klasse..."

I have tried to implement (3) using another derived table using SELECT, MAX() and UNION, but I could not get it to work. Below is my full SQL query with the lines from this failed attempt to implement (3) commented out.

Cheers, Frederick

PS--I am new to this forum (and new to SQL as well), but I have had more of my previous problems answered by reading other people's posts on this forum than from reading any other forum or Web site. This forum is a great resources.

--  SELECT DISTINCT MAX(UserName), MAX(AmountUsed) AS AmountUsed, MAX(AnsweredCorrectly) AS AnsweredCorrectly, CourseName, MAX(course_code) AS course_code, MAX(NoOfQuestionsInCourse) AS NoOfQuestionsInCourse, MAX(NoOfQuestionSetsInCourse) AS NoOfQuestionSetsInCourse
--  FROM (

                SELECT DISTINCT UserName AS UserName, MAX(AmountUsed) AS AmountUsed, MAX(AnsweredCorrectly) AS AnsweredCorrectly, CourseName, MAX(course_code) AS course_code, MAX(NoOfQuestionsInCourse) AS NoOfQuestionsInCourse, MAX(NoOfQuestionSetsInCourse) AS NoOfQuestionSetsInCourse
                FROM (

                        -- Table 1 - All UserAccount/Course combinations that have had quizzez.
                        SELECT DISTINCT dbo.win_user.user_name AS UserName, 
                                        cast(dbo.GetAmountUsed(dbo.session_header.win_user_id, dbo.course.course_id, dbo.course.no_of_questionsets_in_course) as nvarchar(10)) AS AmountUsed, 
                                        Isnull(cast(dbo.GetAnswerCorrectly(dbo.session_header.win_user_id, dbo.course.course_id, dbo.question_set.no_of_questions) as nvarchar(10)),0) AS AnsweredCorrectly, 
                                        dbo.course.course_name AS CourseName, 
                                        dbo.course.course_code,
                                        dbo.course.no_of_questions_in_course AS NoOfQuestionsInCourse, 
                                        dbo.course.no_of_questionsets_in_course AS NoOfQuestionSetsInCourse
                        FROM            dbo.session_detail 
                                        INNER JOIN dbo.session_header ON dbo.session_detail.session_header_id = dbo.session_header.session_header_id
                                        INNER JOIN dbo.win_user ON dbo.session_header.win_user_id = dbo.win_user.win_user_id
                                        INNER JOIN dbo.win_user_course ON dbo.win_user_course.win_user_id = dbo.win_user.win_user_id
                                        INNER JOIN dbo.question_set ON dbo.session_header.question_set_id = dbo.question_set.question_set_id
                                        RIGHT OUTER JOIN dbo.course ON dbo.win_user_course.course_id = dbo.course.course_id
                        WHERE           (dbo.session_detail.no_of_attempts = 1 OR dbo.session_detail.no_of_attempts IS NULL)
                                        AND (dbo.session_detail.is_correct = 1 OR dbo.session_detail.is_correct IS NULL)
                                        AND (dbo.win_user_course.is_active = 'True')
                        GROUP BY        dbo.win_user.user_name, dbo.course.course_name, dbo.question_set.no_of_questions, dbo.course.no_of_questions_in_course, 
                                        dbo.course.no_of_questionsets_in_course, dbo.session_header.win_user_id, dbo.course.course_id, dbo.course.course_code

                    UNION ALL

                        -- Table 2 - All UserAccount/Course combinations that do or do not have quizzes but where the Course is selected for quizzes for that User Account.
                        SELECT          dbo.win_user.user_name AS UserName, 
                                        -1 AS AmountUsed, 
                                        -1 AS AnsweredCorrectly, 
                                        dbo.course.course_name AS CourseName, 
                                        dbo.course.course_code,
                                        dbo.course.no_of_questions_in_course AS NoOfQuestionsInCourse, 
                                        dbo.course.no_of_questionsets_in_course AS NoOfQuestionSetsInCourse
                        FROM            dbo.win_user_course
                                        INNER JOIN dbo.win_user ON dbo.win_user_course.win_user_id = dbo.win_user.win_user_id
                                        RIGHT OUTER JOIN dbo.course ON dbo.win_user_course.course_id = dbo.course.course_id
                        WHERE           (dbo.win_user_course.is_active = 'True')
                        GROUP BY        dbo.win_user.user_name, dbo.course.course_name, dbo.course.no_of_questions_in_course, 
                                        dbo.course.no_of_questionsets_in_course, dbo.course.course_id, dbo.course.course_code

                ) dt_derivedTable_1 

                GROUP BY CourseName, UserName

--      UNION ALL

            -- Table 3 - All Courses.
--          SELECT DISTINCT null AS UserName,
--                          -2 AS AmountUsed, 
--                          -2 AS AnsweredCorrectly, 
--                          dbo.course.course_name AS CourseName,
--                          dbo.course.course_code,
--                          dbo.course.no_of_questions_in_course AS NoOfQuestionsInCourse, 
--                          dbo.course.no_of_questionsets_in_course AS NoOfQuestionSetsInCourse
--          FROM            dbo.course
--          WHERE           is_active = 'True'


--  ) dt_derivedTable_2 

--  GROUP BY CourseName
--  ORDER BY CourseName
+1  A: 

With such filtering requirements (depending on the rows of prior queries), I recommend a table variable.

DECLARE @MyTable TABLE
(
  ID int PRIMARY KEY,
  Name varchar(50),
  QueryNumber int
)

INSERT INTO @MyTable (ID, Name, QueryNumber)
SELECT CustomerID, CustomerName, 1
FROM Customer
WHERE Name = "Bob"

INSERT INTO @MyTable (ID, Name, QueryNumber)
SELECT CustomerID, CustomerName, 2
FROM Customer
WHERE Name = "Joe" and CustomerID not in (SELECT ID FROM @MyTable)

INSERT INTO @MyTable (ID, Name, QueryNumber)
SELECT CustomerID, CustomerName, 3
FROM Customer
WHERE CustomerID not in (SELECT ID FROM @MyTable)

SELECT *
FROM @MyTable
David B
+1, you might need to `order by QueryNumber` and a few other columns. In my own experiences, I find that the performance of table variables declines as you have more rows in the table. If you are working with a large number of rows, try using a #TempTable, and don't forget, you can create an index if necessary too
KM
Hello David, thank you for your reply. However, I do not understand the approach you are suggesting. I do not know what I would put in place of where you have "Name = "Bob"" and "Name = "Joe"". I don't think I'll know what these values would be until after my SQL query runs.
Frederick
A: 

Here is an Oracle flavored solution:

    Select 
        *
    from table1

    UNION

    select 
        *
    from table2
    where not exists(
        select 'x' 
        from table1
        where 
            table2.username = table1.username
            and table2.coursename = table1.coursename
    )

    UNION

    select 
        *
    from table3
    where
        coursename not in (
            Select 
                coursename                                                  
            from table1

            UNION /* the union operator implies distinct, so 
                    there will be no duplicates */

            select 
                coursename
            from table2
            where not exists(
                select 'x' 
                from table1
                where 
                    table2.username = table1.username
                    and table2.coursename = table1.coursename
            )
        )
Mark Porter
Mark, thank you. That worked perfectly! I don't understand the "select 'x'" part, but it works. Again, thank you. Cheers.
Frederick
"exists" is kinda funny. It will return "true" if the subquery has any rows, and false if it has none. The actual contents of the subquery does not matter. So, select 'x' will cause the result to have an anonymous column with the content 'x' for each found row. This is faster than something like select * because the DB doesn't actually have retrieve the rows. This is also faster than "column not in (subquery)" for the same reason.
Mark Porter