views:

429

answers:

8

Hi All,

I'm trying to do precedence matching on a table within a stored procedure. The requirements are a bit tricky to explain, but hopefully this will make sense. Let's say we have a table called books, with id, author, title, date, and pages fields.

We also have a stored procedure that will match a query with ONE row in the table.

Here is the proc's signature:

create procedure match
  @pAuthor varchar(100)
  ,@pTitle varchar(100)
  ,@pDate varchar(100)
  ,@pPages varchar(100)

 as

 ...

The precedence rules are as follows:

  • First, try and match on all 4 parameters. If we find a match return.
  • Next try to match using any 3 parameters. The 1st parameter has the highest precedence here and the 4th the lowest. If we find any matches return the match.
  • Next we check if any two parameters match and finally if any one matches (still following the parameter order's precedence rules).

I have implemented this case-by-case. Eg:

 select @lvId = id 
 from books
 where
  author = @pAuthor 
 ,title = @pTitle 
 ,date = @pDate 
 ,pages = @pPages

if @@rowCount = 1 begin
  select @lvId
  return
end

 select @lvId = id 
  from books
 where
  author = @pAuthor 
 ,title = @pTitle 
 ,date = @pDate 

 if @@rowCount = 1 begin
  select @lvId
  return
end

....

However, for each new column in the table, the number of individual checks grows by an order of 2. I would really like to generalize this to X number of columns; however, I'm having trouble coming up with a scheme.

Thanks for the read, and I can provide any additional information needed.


Added:

Dave and Others, I tried implementing your code and it is choking on the first Order by Clause, where we add all the counts. Its giving me an invalid column name error. When I comment out the total count, and order by just the individual aliases, the proc compiles fine.

Anyone have any ideas?

This is in Microsoft Sql Server 2005

+1  A: 

You don't explain what should happen if more than one result matches any given set of parameters that is reached, so you will need to change this to account for those business rules. Right now I've set it to return books that match on later parameters ahead of those that don't. For example, a match on author, title, and pages would come before one that just matches on author and title.

Your RDBMS may have a different way of handling "TOP", so you may need to adjust for that as well.

SELECT TOP 1
     author,
     title,
     date,
     pages
FROM
     Books
WHERE
     author = @author OR
     title = @title OR
     date = @date OR
     pages = @pages OR
ORDER BY
     CASE WHEN author = @author THEN 1 ELSE 0 END +
     CASE WHEN title = @title THEN 1 ELSE 0 END +
     CASE WHEN date = @date THEN 1 ELSE 0 END +
     CASE WHEN pages = @pages THEN 1 ELSE 0 END DESC,

     CASE WHEN author = @author THEN 8 ELSE 0 END +
     CASE WHEN title = @title THEN 4 ELSE 0 END +
     CASE WHEN date = @date THEN 2 ELSE 0 END +
     CASE WHEN pages = @pages THEN 1 ELSE 0 END DESC
Tom H.
I think this is not quite what he wants because a match on author + title would beat a match on title + date + pages. I think he wanted 3 matches to always beat 2 matches.
Dave Costa
@Dave, Correct I need 3 matches to come before 2. @Tom, We can assume we will find only one match. Select Top 1 would work fine.
Charlie White
I made some changes to correct this issue, but without a WHERE clause performance is likely to be a problem. I'll give it some more thought though.
Tom H.
As in my answer, you could add "WHERE author=@author OR title=@title" etc. which would filter down the set of rows that needs to be sorted.
Dave Costa
I've added that as well
Tom H.
Tom H, The only way I can think of, with this solution, to improve the WHERE clause performance is to use Bitmap indexes, which aren't available in MSSS.
+1  A: 

I don't have time to write out the query, but I think this idea would work.

For your predicate, use "author = @pAuthor OR title = @ptitle ...", so you get all candidate rows.

Use CASE expressions or whatever you like to create virtual columns in the result set, like:

SELECT CASE WHEN author = @pAuthor THEN 1 ELSE 0 END author_match,
       ...

Then add this order by and get the first row returned:

ORDER BY (author_match+title_match+date_match+page_match) DESC,
         author_match DESC,
         title_match DESC,
         date_match DESC
         page_match DESC

You still need to extend it for each new column, but only a little bit.

Dave Costa
Dave, Thanks this looks promising. I will play with a bit and report back.
Charlie White
A: 
      select id, 
               CASE WHEN @pPages = pages 
                    THEN 1 ELSE 0 
               END
             +  Case WHEN @pAuthor=author 
                    THEN 1 ELSE 0 
                END AS 
             /* +  Do this for each attribute. If each of your 
attributes are just as important as the other 
for example matching author is jsut as a good as matching title then 
leave the values alone, if different matches are more 
important then change the values */ as MatchRank  
        from books 

        where  author = @pAuthor OR
               title = @pTitle OR
               date = @pDate

     ORDER BY  MatchRank DESC

Edited

When I run this query (modified only to fit one of my own tables) it works fine in SQL2005.

I'd recommend a where clause but you will want to play around with this to see performance impacts. You will need to use an OR clause otherwise you will loose potential matches

JoshBerke
A: 

Dave and Others, I tried implementing your code and it is choking on the first Order by Clause, where we add all the counts. Its giving me an invalid column name error. When I comment out the total count, and order by just the individual aliases, the proc compiles fine.

Anyone have any ideas?

Charlie White
EDIT: This is in Microsoft Sql Server 2005
Charlie White
@Charlie - I'm about to transfer this extra information to the question. Please remove this 'answer' and edit the question in future.
Jonathan Leffler
You can't use aliases in the order by clause. Repeat the definition of the alias.
recursive
I had no problem using an alias in SQL 2005
JoshBerke
Mine compiles and executes as-is.
le dorfier
A: 

Okay, let me restate my understanding of your question: You want a stored procedure that can take a variable number of parameters and pass back the top row that matches the parameters in the weighted order of preference passed on SQL Server 2005.

Ideally, it will use WHERE clauses to prevent full tables scans plus take advantage of indices and will "short circuit" the search - you don't want to search all possible combinations if one can be found early. Perhaps we can also allow other comparators than = such as >= for dates, LIKE for strings, etc.

One possible way is to pass the parameters as XML like in this article and use .Net stored procedures but let's keep it plain vanilla T-SQL for now.

This looks to me like a binary search on the parameters: Search all parameters, then drop the last one, then drop the second last one but include the last one, etc.

Let's pass the parameters as a delimited string since stored procedures don't allow for arrays to be passed as parameters. This will allow us to get a variable number of parameters in to our stored procedure without requiring a stored procedure for each variation of parameters.

In order to allow any sort of comparison, we'll pass the entire WHERE clause list, like so: title like '%something%'

Passing multiple parameters means delimiting them in a string. We'll use the tilde ~ character to delimit the parameters, like this: author = 'Chris Latta'~title like '%something%'~pages >= 100

Then it is simply a matter of doing a binary weighted search for the first row that meets our ordered list of parameters (hopefully the stored procedure with comments is self-explanatory but if not, let me know). Note that you are always guaranteed a result (assuming your table has at least one row) as the last search is parameterless.

Here is the stored procedure code:

CREATE PROCEDURE FirstMatch
@SearchParams VARCHAR(2000)
AS
BEGIN
    DECLARE @SQLstmt NVARCHAR(2000)
    DECLARE @WhereClause NVARCHAR(2000)
    DECLARE @OrderByClause NVARCHAR(500)
    DECLARE @NumParams INT
    DECLARE @Pos INT
    DECLARE @BinarySearch INT
    DECLARE @Rows INT

    -- Create a temporary table to store our parameters
    CREATE TABLE #params 
    (
        BitMask int,             -- Uniquely identifying bit mask
        FieldName VARCHAR(100),  -- The field name for use in the ORDER BY clause
        WhereClause VARCHAR(100) -- The bit to use in the WHERE clause
    )

    -- Temporary table identical to our result set (the books table) so intermediate results arent output
    CREATE TABLE #junk
    (
        id INT,
        author VARCHAR(50),
        title VARCHAR(50),
        printed DATETIME,
        pages INT
    )

    -- Ill use tilde ~ as the delimiter that separates parameters
    SET @SearchParams = LTRIM(RTRIM(@SearchParams))+ '~'
    SET @Pos = CHARINDEX('~', @SearchParams, 1)
    SET @NumParams = 0

    -- Populate the #params table with the delimited parameters passed
    IF REPLACE(@SearchParams, '~', '') <> ''
    BEGIN
        WHILE @Pos > 0
        BEGIN
            SET @NumParams = @NumParams + 1
            SET @WhereClause = LTRIM(RTRIM(LEFT(@SearchParams, @Pos - 1)))
            IF @WhereClause <> ''
            BEGIN
                -- This assumes your field names dont have spaces and that you leave a space between the field name and the comparator
                INSERT INTO #params (BitMask, FieldName, WhereClause) VALUES (POWER(2, @NumParams - 1), LTRIM(RTRIM(LEFT(@WhereClause, CHARINDEX(' ', @WhereClause, 1) - 1))), @WhereClause) 
            END
            SET @SearchParams = RIGHT(@SearchParams, LEN(@SearchParams) - @Pos)
            SET @Pos = CHARINDEX('~', @SearchParams, 1)
        END
    END 

    -- Set the binary search to search from all parameters down to one in order of preference
    SET @BinarySearch = POWER(2, @NumParams) 
    SET @Rows = 0
    WHILE (@BinarySearch > 0) AND (@Rows = 0)
    BEGIN
        SET @BinarySearch = @BinarySearch - 1
        SET @WhereClause = ' WHERE '
        SET @OrderByClause = ' ORDER BY '
        SELECT @OrderByClause = @OrderByClause + FieldName + ', ' FROM #params WHERE (@BinarySearch & BitMask) = BitMask ORDER BY BitMask
        SET @OrderByClause = LEFT(@OrderByClause, LEN(@OrderByClause) - 1) -- Remove the trailing comma
        SELECT @WhereClause = @WhereClause + WhereClause + ' AND ' FROM #params WHERE (@BinarySearch & BitMask) = BitMask ORDER BY BitMask
        SET @WhereClause = LEFT(@WhereClause, LEN(@WhereClause) - 4) -- Remove the trailing AND

        IF @BinarySearch = 0
        BEGIN
            -- If nothing found so far, return the top row in the order of the parameters fields
            SET @WhereClause = ''
            -- Use the full order sequence of fields to return the results
            SET @OrderByClause = ' ORDER BY '
            SELECT @OrderByClause = @OrderByClause + FieldName + ', ' FROM #params ORDER BY BitMask
            SET @OrderByClause = LEFT(@OrderByClause, LEN(@OrderByClause) - 1) -- Remove the trailing comma
        END

        -- Find out if there are any results for this search
        SET @SQLstmt = 'SELECT TOP 1 id, author, title, printed, pages INTO #junk FROM books' + @WhereClause + @OrderByClause
        Exec (@SQLstmt)

        SET @Rows = @@RowCount
    END

    -- Stop the result set being eaten by the junk table
    SET @SQLstmt = REPLACE(@SQLstmt, 'INTO #junk ', '')

    -- Uncomment the next line to see the SQL you are producing
    --PRINT @SQLstmt

    -- This gives the result set
    Exec (@SQLstmt)
END

This stored procedure is called like so:

FirstMatch 'author = ''Chris Latta''~pages > 100~title like ''%something%'''

There you have it - a fully expandable, optimised search for the top result in weighted order of preference. This was an interesting problem and shows just what you can pull off with native T-SQL.

A couple of small issues with this:

  • it relies on the caller to know that they must leave a space after the field name for the parameter to work properly
  • you can't have field names with spaces in them - fixable with some effort
  • it assumes that the relevant sort order is always ascending
  • the next programmer that has to look at this procedure will think you're insane :)
Chris Latta
I don't think it is a variable number of parameters, per se (all calls will pass all 4 parameters). If all 4 match, then the row(s) should be returned. If there's nothing with all 4, then the row(s) matching the 'best 3' parameters should be returned, then those for the best two; then the best one.
Jonathan Leffler
Yes, but he does say "for each new column, the number of individual checks grows by a factor of two". My solution does not change with extra (or less) columns, allows a weighted search, uses indices, short circuits the result set and allows operators other than equals. It does whats asked by the OP.
Chris Latta
A: 

Try this.

ALTER PROCEDURE match
@pAuthor varchar(100)
,@pTitle varchar(100)
,@pDate varchar(100)
,@pPages varchar(100)
-- exec match 'a title', 'b author', '1/1/2007', 15
AS

SELECT  id,

        CASE WHEN author = @pAuthor THEN 1 ELSE 0 END
        + CASE WHEN title = @pTitle THEN 1 ELSE 0 END
        + CASE WHEN bookdate = @pDate THEN 1 ELSE 0 END
        + CASE WHEN pages = @pPages THEN 1 ELSE 0 END AS matches,

        CASE WHEN author = @pAuthor THEN 4 ELSE 0 END
        + CASE WHEN title = @pTitle THEN 3 ELSE 0 END
        + CASE WHEN bookdate = @pDate THEN 2 ELSE 0 END
        + CASE WHEN pages = @pPages THEN 1 ELSE 0 END AS score
FROM books
WHERE author = #pAuthor 
    OR title = @pTitle 
    OR bookdate = @PDate 
    OR pages = @pPages
ORDER BY matches DESC, score DESC

However, this of course causes a table scan. You can avoid that by making it a union of a CTE and 4 WHERE clauses, one for each property - there will be duplicates, but you can just take the TOP 1 anyway.

EDIT: Added the WHERE ... OR clause. I'd feel more comfortable if it were

SELECT ... FROM books WHERE author = @pAuthor
UNION
SELECT ... FROM books WHERE title = @pTitle
UNION
...
le dorfier
A: 

In regards to the Order By clause failing to compile:

As recursive said(in a comment), alias' may not be within expressions which are used in Order By clauses. to get around this I used a subquery which returned the rows, then ordered by in the outer query. In this way I am able to use the alias' in the order by clause. A little slower but a lot cleaner.

Charlie White
Stop posting answers. Edit the original post or add comments to it. Now you're over 50 you can do comments I believe.
And he should always be able to edit his own question.
Jonathan Leffler
+2  A: 

I believe that the answers your working on are the simplest by far. But I also believe that in SQL server, they will always be full table scans. (IN Oracle you could use Bitmap indexes if the table didn't undergo a lot of simultaneous DML)

A more complex solution but a much more performant one would be to build your own index. Not a SQL Server index, but your own.

Create a table (Hash-index) with 3 columns (lookup-hash, rank, Rowid)

Say you have 3 columns to search on. A, B, C

For every row added to Books you'll insert 7 rows into hash_index either via a trigger or CRUD proc.

First you'll

insert into hash_index 
SELECT HASH(A & B & C), 7 , ROWID
FROM Books

Where & is the concatenation operator and HASH is a function

then you'll insert hashes for A & B, A & C and B & C. You now have some flexibility you can give them all the same rank or if A & B are a superior match to B & C you can give them a higher rank.

And then insert Hashes for A by itself and B and C with the same choice of rank... all the same number or all different... you can even say that a match on A is higher choice than a match on B & C. This solution give you a lot of flexibility.

Of course, this will add a lot of INSERT overhead, but if DML on Books is low or performance is not relevant you're fine.

Now when you go to search you'll create a function that returns a table of HASHes for your @A, @B and @C. you'll have a small table of 7 values that you'll join to the lookup-hash in the hash-index table. This will give you every possible match and possibly some false matches (that's just the nature of hashes). You'll take that result, order desc on the rank column. Then take the first rowid back to the book table and make sure that all of the values of @A @B @C are actually in that row. On the off chance it's not and you've be handed a false positive you'll need to check the next rowid.

Each of these operation in this "roll your own" are all very fast.

  • Hashing your 3 values into a small 7 row table variable = very fast.
  • joining them on an index in your Hash_index table = very fast index lookups
  • Loop over result set will result in 1 or maybe 2 or 3 table access by rowid = very fast

Of course, all of these together could be slower than an FTS... But an FTS will continue to get slower and slower. There will be a size which the FTS is slower than this. You'll have to play with it.