I am working on a piece of code where I need to deal with uvs (2D texture coordinates) that are not necessarily in the 0 to 1 range. As an example, sometimes I will get a uv with a u component that is 1.2. In order to handle this I am implementing a wrapping which causes tiling by doing the following:
u -= floor(u)
v -= floor(v)
Doing this causes 1.2 to become 0.2 which is the desired result. It also handles negative cases, such as -0.4 becoming 0.6.
However, these calls to floor are rather slow. I have profiled my application using Intel VTune and I am spending a huge amount of cycles just doing this floor operation.
Having done some background reading on the issue, I have come up with the following function which is a bit faster but still leaves a lot to be desired (I am still incurring type conversion penalties, etc).
int inline fasterfloor( const float x ) { return x > 0 ? (int) x : (int) x - 1; }
I have seen a few tricks that are accomplished with inline assembly but nothing that seems to work exactly correct or have any significant speed improvement.
Does anyone know any tricks for handling this kind of scenario?