views:

233

answers:

3

I'm having problems with a mammoth legacy PL/SQL procedure which has the following logic:

l_elapsed := dbms_utility.get_time - l_timestamp;

where l_elapsed and l_timestamp are of type PLS_INTEGER and l_timestamp holds the result of a previous call to get_time

This line suddenly started failing during a batch run with a ORA-01426: numeric overflow

The documentation on get_time is a bit vague, possibly deliberately so, but it strongly suggests that the return value has no absolute significance, and can be pretty much any numeric value. So I was suspicious to see it being assigned to a PLS_INTEGER, which can only support 32 bit integers. However, the interweb is replete with examples of people doing exactly this kind of thing.

The smoking gun is found when I invoke get_time manually, it is returning a value of -214512572, which is suspiciously close to the min value of a 32 bit signed integer. I'm wondering if during the time elapsed between the first call to get_time and the next, Oracle's internal counter rolled over from its max value and its min value, resulting in an overflow when trying to subtract one from the other.

Is this a likely explanation? If so, is this an inherent flaw in the get_time function? I could just wait and see if the batch fails again tonight, but I'm keen to get an explanation for this behaviour before then.

+2  A: 

From the 10g doc:

Numbers are returned in the range -2147483648 to 2147483647 depending on platform and machine, and your application must take the sign of the number into account in determining the interval. For instance, in the case of two negative numbers, application logic must allow that the first (earlier) number will be larger than the second (later) number which is closer to zero. By the same token, your application should also allow that the first (earlier) number be negative and the second (later) number be positive.

So while it is safe to assign the result of dbms_utility.get_time to a PLS_INTEGER it is theoretically possible (however unlikely) to have an overflow during the execution of your batch run. The difference between the two values would then be greater than 2^31.

If your job takes a lot of time (therefore increasing the chance that the overflow will happen), you may want to switch to a TIMESTAMP datatype.

Vincent Malgrat
Agh, what a ghastly mess. Thanks for the clarification. I will now rest safely in the knowledge that I can wait until the DBA gets back from holiday, and foist the infernal thing on to him.
skaffman
A: 

Assigning a negative value to your PLS_INTEGER variable does raise an ORA-01426:

SQL> l
  1  declare
  2    a pls_integer;
  3  begin
  4    a := -power(2,33);
  5* end;
SQL> /
declare
*
FOUT in regel 1:
.ORA-01426: numeric overflow
ORA-06512: at line 4

However, you seem to suggest that -214512572 is close to -2^31, but it's not, unless you forgot to typ a digit. Are we looking at a smoking gun?

Regards, Rob.

Rob van Wijk
+2  A: 

Maybe late, but this may benefit someone searching on the same question.

The underlying implementation is a simple 32 bit binary counter, which is incremented every 100th of a second, starting from when the database was last started.

This binary counter is is being mapped onto a PL/SQL BINARY_INTEGER type - which is a signed 32-bit integer (there is no sign of it being changed to 64-bit on 64-bit machines).

So, presuming the clock starts at zero it will hit the +ve integer limit after about 248 days, and then flip over to become a -ve value falling back down to zero.

The good news is that provided both numbers are the same sign, you can do a simple subtraction to find duration - otherwise you can use the 32-bit remainder.

    IF SIGN(:now) = SIGN(:then) THEN
        RETURN :now - :then;
    ELSE
        RETURN MOD(:now - :then + POWER(2,32),POWER(2,32));
    END IF;

Edit : This code will blow the int limit and fail if the gap between the times is too large (248 days) but you shouldn't be using GET_TIME to compare durations measure in days anyway (see below).

Lastly - there's the question of why you would ever use GET_TIME.

Historically, it was the only way to get a sub-second time, but since the introduction of SYSTIMESTAMP, the only reason you would ever use GET_TIME is because it's fast - it is a simple mapping of a 32-bit counter, with no real type conversion, and doesn't make any hit on the underlying OS clock functions (SYSTIMESTAMP seems to).

As it only measures relative time, it's only use is for measuring the duration between two points. For any task that takes a significant amount of time (you know, over 1/1000th of a second or so) the cost of using a timestamp instead is insignificant.

The number of occasions on where it is actually useful is minimal (the only one I've found is checking the age of data in a cache, where doing a clock hit for every access becomes significant).

JulesLt