ansaurus

Question

Technique for determining if an integer sequence can be generated without branches?

Answer 1

A:

If speed is really of the essence, couldn't you write out the instructions for lists upto a certain length? (One could pre-generate this code of course).

so:

 void algorithm1_Length6(int *srcList, int *destList)
 {
      *destList++ = *srcList;
      *destList++ = srcList[2];
      *destList++ = srcList[4];
      *destList++ = srcList[5];
      *destList++ = srcList[3];
      *destList++ = srcList[1];
 }

and all the other variations upto a certain length.

Toad 2009-10-30 18:23:10

"Technically, I could write a big switch/case function but obviously I'm interested in a function that will follow the pattern for arbitrary sizes."

Joseph Garvin 2009-10-31 07:48:45

Answer 2

A:

For a given array you can use method like this:

 void tranform(int[] src, int[] dest) {
        //0, 2, 4, 6, 8, 9, 7, 5, 3, 1
        dest[0] = src[0];
        dest[1] = src[2];
        dest[2] = src[4];
        dest[3] = src[6];
        dest[4] = src[8];
        dest[5] = src[9];
        dest[6] = src[7];
        dest[7] = src[5];
        dest[8] = src[3];
        dest[9] = src[1];
    }

But in general for big arrays it's hard to write such methods, therefore it will be useful if you write generator method like this:

static void createFunction(int[] src, int[] dest) {
        System.out.println("void tranform(int[] src, int[] dest) {");
        for (int i = 0; i < dest.length; i++) {
            for (int j = 0; j < src.length; j++) {
                if (dest[i] == src[j]) {
                    System.out.println("dest[" + i + "]=src[" + j + "];");
                    break;
                }
            }
        }
        System.out.println("}");
    }

invoke it with your arrays: createFunction(new int[]{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, new int[]{0, 2, 4, 6, 8, 9, 7, 5, 3, 1});

And paste output of this method to your program.

rachvela 2009-10-30 18:46:15

I specifically mentioned I was looking for functions that could handle input of varying size. Making a function that can handle varying size that generates functions that can't is a creative workaround but not what I'm looking for. Even ignoring it's hackishness, if optimization is a goal this would definitely not be a good solution.

Joseph Garvin 2009-10-31 07:48:00

Answer 3

+1 A:

If you're optimizing for the PS3 in particular, the Power PC Compiler Writers Guide has techniques on branchfree code in Section 3.1.5 and it has GNU Superoptimizer sequences for branchfree code in Appendix D.

You may be interested in Mike Acton's Cell Performance Blog as well.

Adisak 2009-10-30 19:03:14

FWIW, there should be GNU Superoptimizer sequences for X86 as well for many common cases if you look for them.

Adisak 2009-10-31 15:16:31

Answer 4

+1 A:

If you plot your desired indexes against your input indexes, you get a triangular shaped function. It turns out that for your n=10 case, it is

9.5 - abs(2 (x - 4.75))

Therefore, for general n, it would be

n-0.5 - abs(2*(x - n/2-0.25))

Or in integer form,

(2*n-1 - abs(4*x - 2*n + 1)) / 2

This is completely branchless in that your output indexes are generated with a single mathematecal function. I think the general approach would be to plot out the desired indexes and look for a pattern and a way of representing it with math functions.

Obviously if your desired final indexes form a straight line, then the transformation is simple. If you have a kink in the mapping, then you want to use the absolute value function to introduce a bend, and you can tune the scaling to change the angle of the bend. You can tilt the kink by biasing it (e.g. abs(x)+x/2). If you need a jump discontinuity in your final index function, then use the sign function (hopefully builtin or use abs(x)/x). You need to be creative with how to use the graphs of common functions to your advantage here.

Addendum

If your indexing function is piecewise linear, there is a straightforward algorithm. Assume the desired index function is expressed as a list of segments

{(sx1,sy1)-(ex1,ey1), (sx2,sy2)-(ex2,ey2), ... , (sxN,syN)-(exN,eyN)}
 segment 1            segment 2                   segment N

where exK > sxK for all K and sxK > sx(K-1) for all K (put them from left to right).

k = 1
f(x) = Make affine model of segment k
g(x) = f(x)
Do:
   k = k + 1
   h(x) = Makeaffine model of segment k
   If g(x) and h(x) intersect between ex(k-1) and ex(k)
       f(x) = f(x) + [slope difference of g(x) and h(x)] * ramp(x)
   Else
       f(x) = f(x) + (h(ex(k-1)) - f(ex(k-1))) * step(x)
       f(x) = f(x) + [slope difference of g(x) and h(x)] * ramp(x)

where ramp(x) = (abs(x)+x)/2 and step(x) = (sign(x)+1)/2. f(x) is meant to represent the desired function, g(x) is the last segment's affine model, and h(x) is the current segment's affine model. An affine model is just a line in slope-offset form: a*x+b, and the slope difference is the difference of slopes. This algorithm simply proceeds from left right, adding in the proper pieces of functions as it goes along. The functions it adds on are always zero for x <= 0 so they don't affect the f(x) that has been built up so far.

Of course, there may be some bugs/typos in the above. I really have to get to a meeting, so I can't write any more.

Victor Liu 2009-10-30 19:04:38

Can you calculate abs() without a branch?

Drew Hall 2009-10-30 19:09:45

@Drew Hall: That is almost certainly done in hardware.

Amuck 2009-10-30 19:14:01

Abs is not always in hardware for integers but `fabs` is a common FPU operation. For int's you can do branchless `int Abs(int A) { int Sign = A >> 31; return ( A ^ Sign ) - Sign; }`

Adisak 2009-10-30 19:18:14

FWIW, converting from INT to FLOAT and back is going to be very slow on PS3 due to LHS (Load Hit Store) Penalty. This would be much slower than a branching version on the target platform for the Question.

Adisak 2009-10-30 19:20:28

The piecewise linear addendum is definitely not going to solve the branchfree requirements of the Question if you do it in a loop. To do that branchfree, you have to compute all values simultaneously and then do a branchfree select. You should look to collapse common expressions of the computation values as well when doing branchfree optimization.

Adisak 2009-10-30 19:45:22

The point of the algorithm is to generate code, not to be used literally AS the code.

Victor Liu 2009-10-30 20:59:01

Answer 5

+3 A:

Transform: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 to: 0, 2, 4, 6, 8, 9, 7, 5, 3, 1 (ascending even followed by descending odd).

Simple: given the sequence of N values of X from 0 to N-1, we see the first half of the sequence is 2X. The second half of the sequence is (2N-1)-2X. The sequence splits at X=(N+1)/2 with "integer" math. In the above example, N == 10.

So assuming 32-bit signed ints with arithmetic shift right:

int Transform(int x)
{
    const int seq1=x+x;
    const int mask=(x-((N+1)>>1))>>31;
    const int seq2=(N+N-1)-seq1;
    return (mask&seq1)|((~mask)&seq2);
}

Note that the mask pattern used here is fast because PowerPC has an ANDC (and with complement) that makes (~mask) a free operation.

Adisak 2009-10-30 19:12:43

Could you explain _how_ you came up with that? I'm more looking for insight into general ways of handling these problems then just getting handed an off the shelf solution for one instance, though I appreciate it just the same :)

Joseph Garvin 2009-10-31 07:50:25

I noted there are two sequences... odd and even. I derived the integer formula for each and then figured a mask condition. Any time you need to pick between two conditions and there is a nicely defined boundary, it's fairly easy to make a branchless representation.

Adisak 2009-10-31 08:04:20

It's when you start having more than two conditions to chose from that determining the boundary conditions and the mask variables becomes difficult.

Adisak 2009-10-31 08:05:03

Answer 6

+1 A:

You can always write a polynomial formula using Lagrange interpolation for instance. Not pretty (or particularly fast) but it won't have any branches.

lhf 2009-10-31 10:38:20

Answer 7

A:

Technically any series of operations can be executed without "branching" using a state machine that utilizes boolean operations. The concept of branching is because most programs are a series of instructions executed by a program counter that can go one way or the other.

Even if you're talking about a pure functional approach which is stateless, for a finite set of discrete values, you can always (at the cost of large amounts of memory) use a lookup table.

Jason S 2009-11-01 02:14:11

ansaurus

tags:

views:

answers:

Technique for determining if an integer sequence can be generated without branches?

related questions