tags:

views:

316

answers:

4
+1  Q: 

Weird SQL issue

I have a really weird issue with an SQL query I've been working with for some time. I'm using SQL Server 2005.

Here's an example table from which the query is made:

Log:
Log_ID | FB_ID |   Date    | Log_Name | Log_Type

   7   |   4   | 2007/11/8 |   Nina   |  Critical
   6   |   4   | 2007/11/6 |   John   |  Critical
   5   |   4   | 2007/11/6 |   Mike   |  Critical
   4   |   4   | 2007/11/6 |   Mike   |  Critical
   3   |   3   | 2007/11/3 |   Ben    |  Critical
   2   |   3   | 2007/11/1 |   Ben    |  Critical

Now, the idea is to return the first date for the Log_Person working on each FB_ID, however, in the case there are several Log_Names, I only want the SECOND Log_Name (first time the responsibility is handed over to another). The result should look like this:

Desired result
Log_ID | FB_ID |   Date    | Log_Name | Log_Type

   6   |   4   | 2007/11/6 |   John   |  Critical
   2   |   3   | 2007/11/1 |   Ben    |  Critical

In an earlier thread, Peter Lang and Quassnoi gave brilliant answers which are below. Sadly I can barely understand what happens there, but they worked like a charm. Here's the code:

Quassnoi

SELECT lo4.*
FROM
    (SELECT CASE WHEN ln.log_id IS NULL THEN lo2.log_id ELSE ln.log_id END
    AS log_id, ROW_NUMBER() OVER (PARTITION BY lo2.fb_id ORDER BY lo2.cdate) AS rn 
    FROM 
     (SELECT lo.*,
      (SELECT TOP 1 log_id
      FROM t_log li WHERE li.fb_id = lo.fb_id AND li.cdate >= lo.cdate
      AND li.log_id  lo.log_id AND li.log_name  lo.log_name
      ORDER BY cdate, log_id)
     AS next_id
     FROM t_log lo)
    lo2 LEFT OUTER JOIN t_log ln ON ln.log_id = lo2.next_id)
lo3, t_log lo4
WHERE lo3.rn = 1 AND lo4.log_id = lo3.log_id

Peter Lang SELECT * FROM log WHERE log_id IN (SELECT MIN(log_id) FROM log WHERE (SELECT COUNT(DISTINCT log_name) FROM log log2 WHERE log2.fb_id = log.fb_id ) = 1 OR log.log_name (SELECT log_name FROM log log_3 WHERE log_3.log_id = (SELECT MIN(log_id) FROM log log4 WHERE log4.fb_id = log.fb_id )) GROUP BY fb_id )

Now if you've read this far, here's the question. Why do they both work fine, but as soon as I apply other filters on them, everything gets mixed?

I even tried to create a temporary table with the WITH clause, and use Date and Log_Type filters on that, but it still didn't work. Several results that should've been included with the filters were suddenly left out. When originally I would get the first dated only name, or the second name if there were several from the Log_Name column, now I would randomly get whatever, if any. In a similar manner, using WHERE (DATE BETWEEN '2007/11/1' AND '2007/11/30') would cause a permaloop, where using WHERE (MONTH(Date) = '11') AND (YEAR(Date) = '2007') would work fine. But if I added one more filter to the latter option, for example .. AND WHERE Log_Type = 'Critical', it'd be on permaloop again. Both permaloops happened with Lang's solution.

I need to combine this type of search with another using UNION ALL, so I'm wondering if I'm going to run into any more similarly weird problems in the future with that? There's clearly something I don't understand about SQL here, and my DL for the query is today so I'm kinda stressed out here. Thx for all the help. :)

Edit: To clarify. In need the result of the queries above, and those results need to be filtered to 'critical' cases over a given time (month) ONLY.

This will then be united with another search that returns the first time a FB_ID with 'Support' status (Log_Type) has been logged. The idea is to give a picture of how many new cases are recorded in the DB each month.

Edit 2: Update, Russ Cam's suggestion below is working, but it rules out any FB_ID's that were first dated outside the given range, even if the query result row where the Log_Name changes, would exist within range.

+1  A: 
Russ Cam
The subquery will select the rows whose Date falls over the @end. You need to insert this condition into the inner subquery, and this will probably spoil the index usability (what I think is exactly what happens)
Quassnoi
Have you tried it with Peter's solution?
Russ Cam
@Quassnoi - Agreed that the performance is going to very much depend on the indexes
Russ Cam
My apologies. Quassnoi's code didn't get stuck to begin with, it was Peter's. And it still get stucks after trying this. Quassnoi's doesn't return the resired result though, even if I apply the filters like this. :/
Took me a while to figure out, but actually this query IS working, it simply doesn't include any rows where the *first* FB_ID isn't included in the given time range, even if the row where the Log_Name is actually changed, is (which is how it should work :) ). Gonna look into this more.
Bear with me, will post an edit in a second - I thought that wrapping as a subquery might be a 'quick win' for you, but it's obviously not performant. What we need to do then is select only the necessary results in the inner queries...
Russ Cam
Hmm... The ranking function behaves differently depending on if the date range condition 'is applicable' - I don;t know how best to say this as at first glance it appears illogical.
Russ Cam
For example, I ran your text data above against my second query, first using start and end dates between 1 and 30 Nov 2007 and then with start and end 1 and 30 Nov 2009. In the results, Mike and John get ranked differently for FB_ID 4...
Russ Cam
In the former date case, John gets ranked 1st for FB_ID 4, and in the latter date case, Mike gets ranked 1st for FB_ID 4. Unfortunately I don;t have time to investigate why that is at present, but I would hope that there is a rational explanation for it
Russ Cam
Gonna have to register just to get to vote + for you ppl. :) Thanks all of you, there are still some issues but it's close enough for now. The level of the query is quite honestly getting so far beyond my inexperienced understanding I need a bit more time getting acquainted with all this.
I'll keep on with this on come monday, for now they'll have to settle with 99% instead of 100%.
For the short term, you could insert the resultset from one of the original queries into a temporary table (i.e #temp),index the date field, then run a WHERE clause on the date field
Russ Cam
+1  A: 

Well, thanks for the compliment, first :)

My query actually looks over all records, selects the next responsible person for each record, and assigns the row number for each responsibility transition for a certain fb_id.

If there were no transitions for this fb_id, it will be selected as transition to NULL.

Then the query selects every first transition (that is with ROW_NUMBER of 1), be it a real transition or fake, and checks if it is real or fake.

If it's real (to a non-NULL that is), it returns the id of person who got the responsibility; if not, it returns the person who gave the responsibility to NULL (i. e. did not give it at all).

As you can see, this query relies heavily on the index on (fb_id, cdate, id) to search the next responsible person. If you add new conditions, it cannot use this index anymore and becomes slow.

Please clarify which condition you want to add and we'll try to help you again :)

You said you want to add log_type in to the query.

How does the first transition count? Do you need to return first transition when the both fields and critical, or only transitions from non-critical to critical, or when either one is critical?

If you need to add date range, say, for February only, should it count the person who got the work on February but giving it on March? Or who got the work on January and giving it on February?

Meanwhile, try this:

SELECT lo4.*
FROM
    (
    SELECT CASE WHEN ln.log_id IS NULL THEN lo2.log_id ELSE ln.log_id END AS log_id,
           ROW_NUMBER() OVER (PARTITION BY lo2.fb_id ORDER BY lo2.cdate) AS rn 
    FROM 
        (
        SELECT
               lo.*,
            (
               SELECT TOP 1 log_id
               FROM t_log li
               WHERE li.fb_id = lo.fb_id 
                     AND li.cdate >= CASE WHEN lo.cdate < @range THEN @range ELSE lo.cdate END
                     AND li.cdate < DATEADD(month, 1, @range)
                     AND li.log_id <> lo.log_id
                     AND li.log_name <> lo.log_name
            ORDER BY
                 cdate, log_id
               ) AS next_id
        FROM t_log lo
        WHERE lo.cdate >= @range
          AND lo.cdate < DATEADD(month, 1, @range)
        ) lo2
 LEFT OUTER JOIN t_log ln ON ln.log_id = lo2.next_id
 ) lo3,
t_log lo4
WHERE lo3.rn = 1
  AND lo4.log_id = lo3.log_id

As you can see, there are two checks for date ranges there.

The one in the inner subquery filters out transitions where the recipient is out of date range.

The one in the outer query filters out transitions where the sender is out of range.

Quassnoi
The table is used to record cases marked as FB_ID, and the query needs to list each row where a 'critical' marked case's handling has started, over a given month. The responsible person (Log_Name), first if the only one, second if several, are what signify the start of receiving the case.
Now in *edit* part of my main msg.
A: 

Some very general advice: Try to get a execution plan for all the variants of your query and check which objects are used by the DB to get the data.

It is possible the query optimizer makes use of quite a different set of resources for each query variant and that some of the objects are invalid (e.g. temporary tables / materialized views based on your original table that are not up-to-date etc). Such resources may cause the invalid results you see.

david a.
A: 

Do you think the following will do?



set nocount on

declare @log table
(
log_id int
,fb_id int
,log_date datetime
,log_name nvarchar(25)
,log_type nvarchar(20)
)

insert into @log(log_id, fb_id, log_date, log_name, log_type) values (7, 4, convert(datetime,'8/11/2007', 103), N'Nina', N'Critical')
insert into @log(log_id, fb_id, log_date, log_name, log_type) values (6, 4, convert(datetime,'6/11/2007', 103), N'John', N'Critical')
insert into @log(log_id, fb_id, log_date, log_name, log_type) values (5, 4, convert(datetime,'6/11/2007', 103), N'Mike', N'Critical')
insert into @log(log_id, fb_id, log_date, log_name, log_type) values (4, 4, convert(datetime,'6/11/2007',103), N'Mike', N'Critical')
insert into @log(log_id, fb_id, log_date, log_name, log_type) values (3, 3, convert(datetime,'3/11/2007',103), N'Ben', N'Critical')
insert into @log(log_id, fb_id, log_date, log_name, log_type) values (2, 3, convert(datetime,'1/11/2007',103), N'Ben', N'Critical') 
insert into @log(log_id, fb_id, log_date, log_name, log_type) values (1, 2, convert(datetime,'10/9/2007',103), N'Pat', N'Critical') 
insert into @log(log_id, fb_id, log_date, log_name, log_type) values (0, 2, convert(datetime,'1/9/2007',103), N'Couger', N'Critical') 

declare @result table
(
intid int identity(1,1)
,log_id int
,fb_id int
,log_date datetime
,log_name nvarchar(25)
,log_type nvarchar(20)
,_min_intid int
,_position int
)

insert into @result (log_id, fb_id, log_date, log_name, log_type)
select log_id, fb_id, log_date, log_name, log_type from @log order by fb_id, log_date

update r
set r._min_intId = w.freq
from @result r join 
(select distinct fb_id, min(intid) freq from @result group by fb_id) w
on r.fb_id = w.fb_id

update @result set _position = intid - _min_intid + 1

select log_id, fb_id, log_date, log_name, log_type from @result where _position = 2


ahmjt