views:

119

answers:

2

This is an SQL efficiency question.

A while back I had to write a collection of queries to pull data from an ERP system. Most of these were simple enough but one of them resulted in a rather ineficient query and its been bugging me ever since as there's got to be a better way.

The problem is not complex. You have rows of sales data. In each row you have quantity, sales price and the salesman code, among other information.

Commission is paid based on a stepped sliding scale. The more they sell, the better the commission. Steps might be 1000, 10000, 10000$ and so forth. The real world problem is more complex but thats it essentially it.

The only way I found of doing this was to do something like this (obviously not the real query)

select qty, price, salesman,
  (select top 1 percentage from comissions 
    where comisiones.salesman = saleslines.salesman 
    and saleslines.qty > comisiones.qty
    order by comissiones.qty desc
  ) percentage
from saleslines 

this results in the correct commission but is horrendously heavy.

Is there a better way of doing this? I'm not looking for someone to rewrite my sql, more 'take a look as foobar queries' and I can take it from there.

The real life commission structure can be specified for different salesmen, articles and clients and even sales dates. It also changes from time to time, so everything has to be driven by the data in the tables... i.e I can't put fixed ranges in the sql. The current query returns some 3-400000 rows and takes around 20-30 secs. Luckily its only used monthly but the slowness is kinda bugging me.

This is on mssql.

Ian

edit:

I should have given a more complex example from the beginning. I realize now that my initial example is missing a few essential elements of the complexity, apologies to all.

This may better capture it

select client-code, product, product-family, qty, price, discount, salesman,
    (select top 1 percentage from comissions 
        where comisiones.salesman = saleslines.salesman 
        and saleslines.qty > comisiones.qty
        and [
           a collection of conditions which may or may not apply:
           Exclude rows if the salesman has offered discounts above max discounts
                which appear in each row in the commissions table
           There may be a special scale for the product family
           There may be a special scale for the product
           There may be a special scale for the client

           A few more cases
            ]
        order by [
            The user can control the order though a table 
            which can prioritize by client, family or product
            It normally goes from most to least specific.
            ]
      ) percentage
    from saleslines 

needless to say the real query is not easy to follow. Just to make life more interesting, its naming is multi language.

Thus for every row of salesline the commission can be different.

It may sound overly complex but if you think of how you would pay commission it makes sense. You don't want to pay someone for selling stuff at high discounts, you also want to be able to offer a particular client a discount on a particular product if they buy X units. The salesman should earn more if they sell more.

In all the above I'm excluding date limited special offers.

I think partitions may be the solution but I need to explore this more indepth as I know nothing about partitions. Its given me a few ideas.

+2  A: 

If you are using a version of SQL Server that supports common-table expressions such as SQL Server 2005 and later, a more efficient solution might be:

With RankedCommissions As
    (
    Select SL.qty, SL.price, SL.salesman, C.percentage
        , Row_Number() Over ( Partition By SL.salesman Order By C.Qty Desc ) As CommissionRank
    From SalesLines As SL
        Join Commissions As C
            On SL.salesman = C.salesman
                 And SL.qty > C.qty
    )
Select qtr, price, salesman, percentage
From RankedCommissions
Where CommissionRank = 1

If you needed to account for the possibility that there are no Commissions values for a given salesman where the SalesLine.Qty > Commission.Qty, then you could do something like:

With RankedCommissions As
    (
    Select SL.qty, SL.price, SL.salesman, C.percentage
        , Row_Number() Over ( Partition By SL.salesman Order By C.Qty Desc ) As CommissionRank
    From SalesLines As SL
        Join Commissions As C
            On SL.salesman = C.salesman
                And SL.qty > C.qty
    )
Select SL.qtr, SL.price, SL.salesman, RC.percentage
From SalesLines As SL
    Left Join RankedCommissions As RC
        On RC.salesman = SL.salesman
            And RC.CommissionRank = 1
Thomas
You don't *need* to have support for common-table expressions to use this approach. You could just as easily write the CTE as an inline view. The `ROW_NUMBER` function, however, does require SQL Server 2005 or higher.
Cheran S
My original query was written for sql2000 but the db migrated to sql2008 recently, which is what provoked my interest in taking advantage of new functionality. Thomas, this looks like being a solution. However its a very different approach so I'm going to have to sit back for a while with this and see how I can use it. Like the solution. Thanks -Ian
Ian
@Ian - Understand. CTEs and Ranking functions like Row_Number, are incredibly powerful new feature in 2005/2008. They help simplify a series of problematic queries such as the one you presented. The real trick in my solution is the use of the Partition By clause which orders the values by salesman (salesperson?).
Thomas
If I understand how they would work (without having tried them out yet - its a busy day) then I may need to partition over several different groupings and 'union' the results together and then assign a ranking to them all. well, thanks again thomas. I'll let you know if it works.. or not.
Ian
A: 
select 
     qty, price, salesman, 
     max(percentage)
from saleslines 
     inner join comissions on commisions.salesman = saleslines.salesman and 
          saleslines.qty > comissions.qty
group by 
     qty, price, salesman
potatopeelings
-1, your code gives the highest percentage; the original query gives the percentage for the highest qty.
Gabe
the INNER JOIN has a condition (saleslines.qty > comissions.qty) that filters out all commision lines with a quantity greater than the salesline quantiy. The max on the remaining rows gives the percentage for the highest quantity (not the highest percentage).
potatopeelings
or so i think :-)
potatopeelings
I can't just select the max percentage. The real world example could have a scale of commissions for the client, for the product, for the products-family and sales specials. The order of these is another table. The condition (saleslines.qty > comissions.qty) is a simplified version of the problem. This was why I said I wasn't just looking for someone to rewrite my query - more I was looking for a pointer to a better technique. Thanks for the suggestion in any case, offers of help always welcome
Ian
The generic query would be something like - SELECT <aggregate functions> FROM Table1 JOIN Table2 ON <Join condition> + <Filter condition>... JOIN Table3 ON <Join condition> + <Filter condition>... Basically we convert the derived table into a join. Unless there's some really twisted logic I think it can be extended to multiple tables and filter conditions. Hope this helps.
potatopeelings