views:

538

answers:

7

For a call-rating system, I'm trying to split a telephone call duration into sub-durations for different tariff-periods. The calls are stored in a SQL Server database and have a starttime and total duration. Rates are different for night (0000 - 0800), peak (0800 - 1900) and offpeak (1900-235959) periods.

For example: A call starts at 18:50:00 and has a duration of 1000 seconds. This would make the call end at 19:06:40, making it 10 minutes / 600 seconds in the peak-tariff and 400 seconds in the off-peak tariff.

Obviously, a call can wrap over an unlimited number of periods (we do not enforce a maximum call duration). A call lasting > 24 h can wrap all 3 periods, starting in peak, going through off-peak, night and back into peak tariff.

Currently, we are calculating the different tariff-periods using recursion in VB. We calculate how much of the call goes in the same tariff-period the call starts in, change the starttime and duration of the call accordingly and repeat this process till the full duration of the call has been reach (peakDuration + offpeakDuration + nightDuration == callDuration).

Regarding this issue, I have 2 questions:

  • Is it possible to do this effectively in a SQL Server statement? (I can think of subqueries or lots of coding in stored procedures, but that would not generate any performance improvement)

  • Will SQL Server be able to do such calculations in a way more resource-effective than the current VB scripts are doing it?

A: 

Effectively in T-SQL? I suspect not, with the schema as described at present.

It might be possible, however, if your rate table stores the three tariffs for each date. There is at least one reason why you might do this, apart from the problem at hand: it's likely at some point that rates for one period or another might change and you may need to have the historic rates available.

So say we have these tables:

CREATE TABLE rates (
    from_date_time DATETIME
,   to_date_time DATETIME
,   rate MONEY
)

CREATE TABLE calls (
    id INT
,   started DATETIME
,   ended DATETIME
)

I think there are three cases to consider (may be more, I'm making this up as I go):

  1. a call occurs entirely within one rate period
  2. a call starts in one rate period (a) and ends in the next (b)
  3. a call spans at least one complete rate period

Assuming rate is per second, I think you might produce something like the following (completely untested) query

SELECT id, DATEDIFF(ss, started, ended) * rate /* case 1 */
FROM rates JOIN calls ON started > from_date_time AND ended < to_date_time
UNION
SELECT id, DATEDIFF(ss, started, to_date_time) * rate /* case 2a and the start of case 3 */
FROM rates JOIN calls ON started > from_date_time AND ended > to_date_time
UNION
SELECT id, DATEDIFF(ss, from_date_time, ended) * rate /* case 2b and the last part of case 3 */
FROM rates JOIN calls ON started < from_date_time AND ended < to_date_time
UNION
SELECT id, DATEDIFF(ss, from_date_time, to_date_time) * rate /* case 3 for entire rate periods, should pick up all complete periods */
FROM rates JOIN calls ON started < from_date_time AND ended > to_date_time

You could apply a SUM..GROUP BY over that in SQL or handle it in your code. Alternatively, with carefully-constructed logic, you could probably merge the UNIONed parts into a single WHERE clause with lots of ANDs and ORs. I thought the UNION showed the intent rather more clearly.

HTH & HIW (Hope It Works...)

Mike Woodhouse
A: 

This is a thread about your problem we had over at sqlteam.com. take a look because it includes some pretty slick solutions.

Mladen Prajdic
A: 

Following on from Mike Woodhouse's answer, this may work for you:

SELECT id, SUM(DATEDIFF(ss, started, ended) * rate)
FROM rates 
JOIN calls ON 
     CASE WHEN started < from_date_time 
          THEN DATEADD(ss, 1, from_date_time) 
          ELSE started > from_date_time
   AND 
     CASE WHEN ended > to_date_time 
          THEN DATEADD(ss, -1, to_date_time) 
          ELSE ended END 
     < ended
GROUP BY id
ck
A: 

An actual schema for the relevant tables in your database would have been very helpful. I'll take my best guesses. I've assumed that the Rates table has start_time and end_time as the number of minutes past midnight.

Using a calendar table (a VERY useful table to have in most databases):

SELECT
     C.id,
     R.rate,
     SUM(DATEDIFF(ss,
          CASE
               WHEN C.start_time < R.rate_start_time THEN R.rate_start_time
               ELSE C.start_time
          END,
          CASE
               WHEN C.end_time > R.rate_end_time THEN R.rate_end_time
               ELSE C.end_time
          END)) AS 
FROM
     Calls C
INNER JOIN
     (
     SELECT
          DATEADD(mi, Rates.start_time, CAL.calendar_date) AS rate_start_time,
          DATEADD(mi, Rates.end_time, CAL.calendar_date) AS rate_end_time,
          Rates.rate
     FROM
          Calendar CAL
     INNER JOIN Rates ON
          1 = 1
     WHERE
          CAL.calendar_date >= DATEADD(dy, -1, C.start_time) AND
          CAL.calendar_date <= C.start_time
     ) AS R ON
          R.rate_start_time < C.end_time AND
          R.rate_end_time > C.start_time
GROUP BY
     C.id,
     R.rate

I just came up with this as I was typing, so it's untested and you will very likely need to tweak it, but hopefully you can see the general idea.

I also just realized that you use a start_time and a duration for your calls. You can just replace C.end_time wherever you see it with DATEADD(ss, C.start_time, C.duration) assuming that the duration is in seconds.

This should perform pretty quickly in any decent RDBMS assuming proper indexes, etc.

Tom H.
A: 

Provided that you calls last less than 100 days:

WITH generate_range(item) AS
(
    SELECT 0
    UNION ALL
    SELECT item + 1
    FROM generate_range
    WHERE item < 100
)
SELECT tday, id, span
FROM   (
       SELECT tday, id,
           DATEDIFF(minute,
        CASE WHEN tbegin < clbegin THEN clbegin ELSE tbegin END,
        CASE WHEN tend < clend THEN tend ELSE clend END
       ) AS span
     FROM (
       SELECT DATEADD(day, item, DATEDIFF(day, 0, clbegin)) AS tday,
         ti.id,
         DATEADD(minute, rangestart, DATEADD(day, item, DATEDIFF(day, 0, clbegin))) AS tbegin,
         DATEADD(minute, rangeend, DATEADD(day, item, DATEDIFF(day, 0, clbegin))) AS tend
       FROM calls, generate_range, tariff ti
       WHERE DATEADD(day, 1, DATEDIFF(day, 0, clend)) > DATEADD(day, item, DATEDIFF(day, 0, clbegin))
       ) t1
     ) t2
WHERE   span > 0

I'm assuming you keep your tariffs ranges in minutes from midnight and count lengths in minutes too.

Quassnoi
+1  A: 

It seems to me that this is an operation with two phases.

  1. Determine which parts of the phone call use which rates at which time.
  2. Sum the times in each of the rates.

Phase 1 is trickier than Phase 2. I've worked the example in IBM Informix Dynamic Server (IDS) because I don't have MS SQL Server. The ideas should translate easily enough. The INTO TEMP clause creates a temporary table with an appropriate schema; the table is private to the session and vanishes when the session ends (or you explicitly drop it). In IDS, you can also use an explicit CREATE TEMP TABLE statement and then INSERT INTO temp-table SELECT ... as a more verbose way of doing the same job as INTO TEMP.

As so often in SQL questions on SO, you've not provided us with a schema, so everyone has to invent a schema that might, or might not, match what you describe.

Let's assume your data is in two tables. The first table has the call log records, the basic information about the calls made, such as the phone making the call, the number called, the time when the call started and the duration of the call:

CREATE TABLE clr  -- call log record
(
    phone_id      VARCHAR(24) NOT NULL,   -- billing plan
    called_number VARCHAR(24) NOT NULL,   -- needed to validate call
    start_time    TIMESTAMP   NOT NULL,   -- date and time when call started
    duration      INTEGER     NOT NULL    -- duration of call in seconds
                  CHECK(duration > 0),
    PRIMARY KEY(phone_id, start_time)
    -- other complicated range-based constraints omitted!
    -- foreign keys omitted
    -- there would probably be an auto-generated number here too.
);
INSERT INTO clr(phone_id, called_number, start_time, duration)
    VALUES('650-656-3180', '650-794-3714', '2009-02-26 15:17:19', 186234);

For convenience (mainly to save writing the addition multiple times), I want a copy of the clr table with the actual end time:

SELECT  phone_id, called_number, start_time AS call_start, duration,
        start_time + duration UNITS SECOND AS call_end
    FROM clr
    INTO TEMP clr_end;

The tariff data is stored in a simple table:

CREATE TABLE tariff
(
    tariff_code   CHAR(1)      NOT NULL   -- code for the tariff
                  CHECK(tariff_code IN ('P','N','O'))
                  PRIMARY KEY,
    rate_start    TIME         NOT NULL,  -- time when rate starts
    rate_end      TIME         NOT NULL,  -- time when rate ends
    rate_charged  DECIMAL(7,4) NOT NULL   -- rate charged (cents per second)
);
INSERT INTO tariff(tariff_code, rate_start, rate_end, rate_charged)
    VALUES('N', '00:00:00', '08:00:00', 0.9876);
INSERT INTO tariff(tariff_code, rate_start, rate_end, rate_charged)
    VALUES('P', '08:00:00', '19:00:00', 2.3456);
INSERT INTO tariff(tariff_code, rate_start, rate_end, rate_charged)
    VALUES('O', '19:00:00', '23:59:59', 1.2345);

I debated whether the tariff table should use TIME or INTERVAL values; in this context, the times are very similar to intervals relative to midnight, but intervals can be added to timestamps where times cannot. I stuck with TIME, but it made things messy.

The tricky part of this query is generating the relevant date and time ranges for each tariff without loops. In fact, I ended up using a loop embedded in a stored procedure to generate a list of integers. (I also used a technique that is specific to IBM Informix Dynamic Server, IDS, using the table ID numbers from the system catalog as a source of contiguous integers in the range 1..N, which works for numbers from 1 to 60 in version 11.50.)

CREATE PROCEDURE integers(lo INTEGER DEFAULT 0, hi INTEGER DEFAULT 0)
    RETURNING INT AS number;
    DEFINE i INTEGER;
    FOR i = lo TO hi STEP 1
        RETURN i WITH RESUME;
    END FOR;
END PROCEDURE;

In the simple case (and the most common case), the call falls in a single-tariff period; the multi-period calls add the excitement.

Let's assume we can create a table expression that matches this schema and covers all the timestamp values we might need:

CREATE TEMP TABLE tariff_date_time
(
     tariff_code   CHAR(1)      NOT NULL,
     rate_start    TIMESTAMP    NOT NULL,
     rate_end      TIMESTAMP    NOT NULL,
     rate_charged  DECIMAL(7,4) NOT NULL
);

Fortunately, you haven't mentioned weekend rates, so you charge the customers the same

rates at the weekend as during the week. However, the answer should adapt to such

situations if at all possible. If you were to get as complex as giving weekend rates on

public holidays, except that at Christmas or New Year, you charge peak rate instead of

weekend rate because of the high demand, then you would be best off storing the rates in a permanent tariff_date_time table.

The first step in populating tariff_date_time is to generate a list of dates which are relevant to the calls:

SELECT DISTINCT EXTEND(DATE(call_start) + number, YEAR TO SECOND) AS call_date
    FROM clr_end,
         TABLE(integers(0, (SELECT DATE(call_end) - DATE(call_start) FROM clr_end)))
         AS date_list(number)
    INTO TEMP call_dates;

The difference between the two date values is an integer number of days (in IDS). The procedure integers generates values from 0 to the number of days covered by the call and stores the result in a temp table. For the more general case of multiple records, it might be better to calculate the minimum and maximum dates and generate the dates in between rather than generate dates multiple times and then eliminate them with the DISTINCT clause.

Now use a cartesian product of the tariff table with the call_dates table to generate the rate information for each day. This is where the tariff times would be neater as intervals.

SELECT  r.tariff_code,
        d.call_date + (r.rate_start - TIME '00:00:00') AS rate_start,
        d.call_date + (r.rate_end   - TIME '00:00:00') AS rate_end,
        r.rate_charged
    FROM call_dates AS d, tariff AS r
    INTO TEMP tariff_date_time;

Now we need to match the call log record with the tariffs that apply. The condition is a standard way of dealing with overlaps - two time periods overlap if the end of the first is later than the start of the second and if the start of the first is before the end of the second:

SELECT tdt.*, clr_end.*
FROM tariff_date_time tdt, clr_end
WHERE tdt.rate_end > clr_end.call_start
  AND tdt.rate_start < clr_end.call_end
INTO TEMP call_time_tariff;

Then we need to establish the start and end times for the rate. The start time for the rate is the later of the start time for the tariff and the start time of the call. The end time for the rate is the earlier of the end time for the tariff and the end time of the call:

SELECT  phone_id, called_number, tariff_code, rate_charged,
        call_start, duration,
        CASE WHEN rate_start < call_start THEN call_start
        ELSE rate_start END AS rate_start,
        CASE WHEN rate_end >= call_end THEN call_end
        ELSE rate_end END AS rate_end
    FROM call_time_tariff
    INTO TEMP call_time_tariff_times;

Finally, we need to sum the times spent at each tariff rate, and take that time (in seconds) and multiply by the rate charged. Since the result of SUM(rate_end - rate_start) is an INTERVAL, not a number, I had to invoke a conversion function to convert the INTERVAL into a DECIMAL number of seconds, and that (non-standard) function is iv_seconds:

SELECT phone_id, called_number, tariff_code, rate_charged,
       call_start, duration,
       SUM(rate_end - rate_start) AS tariff_time,
       rate_charged * iv_seconds(SUM(rate_end - rate_start)) AS tariff_cost
   FROM call_time_tariff_times
   GROUP BY phone_id, called_number, tariff_code, rate_charged,
            call_start, duration;

For the sample data, this yielded the data (where I'm not printing the phone number and called number for compactness):

N   0.9876   2009-02-26 15:17:19   186234   0 16:00:00   56885.760000000
O   1.2345   2009-02-26 15:17:19   186234   0 10:01:11   44529.649500000
P   2.3456   2009-02-26 15:17:19   186234   1 01:42:41  217111.081600000

That's a very expensive call, but the telco will be happy with that. You can poke at any of the intermediate results to see how the answer is derived. You can use fewer temporary tables at the cost of some clarity.

For a single call, this will not be much different than running the code in VB in the client. For a lot of calls, this has the potential to be more efficient. I'm far from convinced that recursion is necessary in VB - straight iteration should be sufficient.

Jonathan Leffler
A: 

The big problem with performing this kind of calculation at the database level is that it takes resource away from your database while it's going on, both in terms of CPU and availability of rows and tables via locking. If you were calculating 1,000,000 tariffs as part of a batch operation, then that might run on the database for a long time and during that time you'd be unable to use the database for anything else.

If you have the resource, retrieve all the data you need with one transaction and do all the logic calculations outside the database, in a language of your choice. Then insert all the results. Databases are for storing and retrieving data, and any business logic they perform should be kept to an absolute bare minimum at all times. Whilst brilliant at some things, SQL isn't the best language for date or string manipulation work.

I suspect you're already on the right lines with your VBA work, and without knowing more it certainly feels like a recursive, or at least an iterative, problem to me. When done correctly recursion can be a powerful and elegant solution to a problem. Tying up the resources of your database very rarely is.

banjollity