views:

264

answers:

5

I'm doing some audio processing with float. The result needs to be converted back to PCM samples, and I noticed that the cast from float to int is surprisingly expensive. Whats furthermore frustrating that I need to clip the result to the range of a short (-32768 to 32767). While I would normally instictively assume that this could be assured by simply casting float to short, this fails miserably in Java, since on the bytecode level it results in F2I followed by I2S. So instead of a simple:

int sample = (short) flotVal;

I needed to resort to this ugly sequence:

int sample = (int) floatVal;
if (sample > 32767) {
    sample = 32767;
} else if (sample < -32768) {
    sample = -32768;
}

Is there a faster way to do this?

(about ~6% of the total runtime seems to be spent on casting, while 6% seem to be not that much at first glance, its astounding when I consider that the processing part involves a good chunk of matrix multiplications and IDCT)

  • EDIT The cast/clipping code above is (not surprisingly) in the body of a loop that reads float values from a float[] and puts them into a byte[]. I have a test suite that measures total runtime on several test cases (processing about 200MB of raw audio data). The 6% were concluded from the runtime difference when the cast assignment "int sample = (int) floatVal" was replaced by assigning the loop index to sample.

  • EDIT @leopoldkot: I'm aware of the truncation in Java, as stated in the original question (F2I, I2S bytecode sequence). I only tried the cast to short because I assumed that Java had an F2S bytecode, which it unfortunately does not (comming originally from an 68K assembly background, where a simple "fmove.w FP0, D0" would have done exactly what I wanted).

+1  A: 

When you cast int to short you never get clipping functionality, bits are truncated and then are read as short. E.g. (short)-40000 becomes 25536, and not -32768 as you expected.

Probably you have to edit you question, I am sure you know it if you disassembled bytecode. Also, there is a JIT compiler which might optimize this code (because it is called often) to platform dependent instructions.

Please convert this answer to comment.

leopoldkot
A: 

float to int conversions is one of the slowest operations you can do on an x86 processor, as it requires modifying the x87 rounding modes (twice), which serializes and flushes the processor. You can get a sizeable speedup if you can use SSE instructions instead of x87 instructions, but I have no idea if there's a way to do that in java. Perhaps try using an x86_64 JVM?

Chris Dodd
+1  A: 

You could turn two comparisons into one for values which are in range. This could halve the cost. Currently you perform only one comparison if the value is too negative. (which might not be your typical case)

if (sample + 0x7fff8000 < 0x7fff0000)
    sample = sample < 0 ? -32768 : 32767;
Peter Lawrey
Your assumption is correct, in the overwhelming majority of cases sample is inside range (>99.9%). So one less branch helps, although only a bit. I love how your method uses integer overflow to check the upper bound implicitly. I'll definetly remember this trick.
Durandal
A: 

int sample = ((int)floatval) & 0xffff;

EJP
This takes the remainder of floatval/0xFFFF, which is *not* what the OP wants.
Wallacoloo
No it doesn't, it takes the remainder of floatval/0x10000 actually, but it has exactly the same effect as (short), which he was using, so it presumably is exactly what he wants. So kindly remove the downvote. But probably I2S is quicker.
EJP
A: 

This is Python, but should be easy to convert. I don't know how costly the floating point operations are, but if you can keep it in integer registers you might have some boost; this assumes that you can reinterpret the IEEE754 bits as an int. (That's what my poorly named float2hex is doing.)

import struct

def float2hex(v):
    s = struct.pack('f', v)
    h = struct.unpack('I', s)[0]
    return h

def ToInt(f):
    h = float2hex(f)
    s = h >> 31
    exp = h >> 23 & 0xFF
    mantissa = h & 0x7FFFFF
    exp = exp - 126
    if exp >= 16:
        if s:
            v = -32768
        else:
            v = 32767
    elif exp < 0:
        v = 0
    else:
        v = mantissa | (1 << 23)
        exp -= 24
        if exp > 0:
            v = v << exp
        elif exp < 0:
            v = v >> -exp

        if s:
            v = -v

    print v

This branching may kill you, but maybe this provides something useful anyway? This rounds toward zero.

dash-tom-bang
dash-tom-bang
(by cunning operation, you could check `(float_bits if that check passes, you know you're out-of-range and can do a possibly-slower set of checks to determine the sign of the overflow, since if the exponent is greater than this value, regardless of the sign bit, you're out of range.)
dash-tom-bang
The thought of performing the cast in integer by hand crossed my mind, but I didn't know how to do it. Converting the float raw bit to int requires to call a native method in Java (Float.floatToRawIntBits), and in conjunction with the extensive checks needed, is a lot slower than the cast + compare of the accepted answer. BTW: The sign can be tested simply using h < 0 in Java, since int is signed. This eliminates the need for variable s (might not be an option in Phyton)
Durandal