views:

1010

answers:

7

Consider the following problem. You have a bit-string that represents the current scheduled slave in one-hot encoding. For example, "00000100" (with the leftmost bit being #7 and rightmost #0) means that slave #2 is scheduled.

Now, I want to pick the next scheduled slave in a round-robin scheduling scheme, with a twist. I have a "request mask" which says which slaves actually want to be scheduled. The next slave will be picked only from those that want to.

Some examples (assume round-robin scheduling is done by rotating left). Example1:

  • Current: "00000100"
  • Mask: "01100000"
  • Next schedule: "00100000" - in normal round-robin, #3 and then #4 should come after #2, but they don't request, so #5 is picked.

Example2:

  • Current: "01000000"
  • Mask: "00001010"
  • Next: "00000010" - because scheduling is done by cycling left, and #1 is the first requesting slave in that order.


Now, this can be easily coded in a loop, I know. But I actually want to get my result by a bit-twiddling operation, without loops. The motivation: I want to implement this in hardware (in an FPGA) in VHDL/Verilog.

A bonus is to make up an algorithm that's generic for any amount of slaves N.

By the way, this is not a homework question. It's an important problem whenever one wants to schedule slaves in some manner, and condition the scheduling by the slaves' requests. My current solution is somewhat "heavy" and I wanted to know if I'm missing something obvious.

+1  A: 

Assuming twos complement representation, call your two words mask and current, in C:

mask_lo = (current << 1) - 1; // the bits to the right and including current
mask_hi = ~mask_lo;           // the bits to the left of current
                              // the left bits, otherwise right:
next = (mask & mask_hi) ? (mask & mask_hi) : (mask & mask_lo);
return (next & -next);        // the least significant bit set
Doug Currie
Downvoted because the algorithm does not work. It will return the lowest bit set mask that is not also set in current. Eg: mask = 01100001, current = 00000100, it will return 00000001
e.James
eJames, are you sure about that? I think it will return 00100000.
Jules
e.James
As written, next uses || which evaluates to 0 or 1 in C, so the result is either 0 or 1. use next = (mask
Pete Kirkham
Yes, that would do it. Also, the second line needs to be << 1 instead of << 2
e.James
eJames - you have a point here. If || is used in its C short-circuiting manner, Doug is right. but if it's used in its "hardware" interpretation, you've provided a counter example.
Eli Bendersky
In all fairness, the idea for the algorithm is sound. I'll edit in the changes and replace my downvote with an upvote.
e.James
though, I guess in hardware one can quite simply implement the short-circuiting "or" as well
Eli Bendersky
I've made the changes and reversed my vote. I tested the new version with gcc3.4.5 and it works like a charm.
e.James
I've also come up with a logic/hardware solution: http://stackoverflow.com/questions/480405/finding-the-next-in-round-robin-scheduling-by-bit-twiddling#486480
e.James
+1  A: 

Subracting 1 is the essential idea here. It's used to cascade borrows through the bits to find the next task.

bits_before_current = ~(current-1) & ~current
bits_after_current = current-1
todo = (mask & bits_before_current) 
if todo==0: todo = (mask & bits_after_current) // second part is if we have to wrap around
next = last_bit_of_todo = todo & -todo

This will use a loop internally though...

Jules
+1  A: 

Interesting problem! I can't help but wonder if you can't simplify your scheduler operation so this sort of operation would be necessary.

Given that you know VHDL, I won't go into detail, but my suggestion would be the following:

Use a 3 bit encoder to turn the currently scheduled task into a number:

01000000 --> 6

Then use a barrel shifter to rotate the mask by that number + 1 (to skip the current task):

00001010 --> 00010100

Then use a priority encoder to find the first available "next" task:

00010100 --> 00000100 --> 2

Then reverse the barrel shift by addition:

(2+7) % 8 = 1

Which when re-encoded will give the next scheduled task:

00000010

Should be very fast and straightforward, although the barrel shifter is 'expensive' in terms of realestate, but I don't see an easy way to get around that at the moment.

Edit: Doug's solution is significantly more elegant...

Adam Davis
this is very close to what I had in mind. But apart from the barrel shifter, the priority encoder is also quite expensive, isn't it?
Eli Bendersky
A priority encoder isn't cheap, but it's smaller than a barrel shifter. Still, the solutions presented in other answers should yield a smaller overall solution, although the ideal VHDL compiler will do the karnaugh map analysis and minimize every solution to the same small equation...
Adam Davis
+4  A: 

A loop does not have to be bad.

I would simply do

current[i] = current[i-1] & mask[i] |                         // normal shift logic
                mask[i] & current[i-2] & !mask[i-1] |         // here build logic 
                ...                                          // expression for 
                                                             // remaining

And then put it into a generate loop (ie it will get unrolled into hardware), which will produce parallel hardware for the expressions.

Other here mentioned solutions use multiple "-". I can only discourage them, as this will get you a really expensive operation. Esp. in one hot you can get easily more than > 32 bits, which will not easily be implementable in HW, as the borrow has to go through all bits (the deadicated carry logic on certain fpgas make it approachable for small number of bits).

flolo
Agreed, my bit twiddling solution is much less efficient in hardware.
Jules
can you elaborate a bit more about the algorithm, it's not obvious! what is "i"? generate loop by what, by "i", i.e. per each bit of the "current" ?
Eli Bendersky
+2  A: 
e.James
great solution. but I still see a combinatorial loop here
Eli Bendersky
although it does happen only when current is all 0s (otherwise it's blocked by some AND), i'm not sure how the synthesizer will take it.
Eli Bendersky
Yeah, there still needs to be a check for current == 0. Pax suggests a good solution in http://stackoverflow.com/questions/486473/how-would-you-handle-a-special-case-in-this-digital-logic-system
e.James
@eliben: In case you missed it, Marty posted verilog code for this logic in http://stackoverflow.com/questions/486471/how-would-you-implement-this-digital-logic-in-verilog-or-vhdl/488863#488863
e.James
A: 

This should do what you want:

number_of_tasks= <number of tasks, in the example this is 8>
next_mask= current | (current - 1);
next_barrel= next | (next << number_of_tasks);
next_barrel&= ~number_of_tasks;
next_barrel&= -next_barrel;
next_barrel|= next_barrel >> number_of_tasks;
next_task_mask= next_barrel & -next_barrel;

Basically, duplicate the bits of the next task mask, mask off the bits we don't want to consider, find the lowest set bit, fold the high bits back in, then take the lowest bit set. This runs in constant time.

Edit: Updating to take into account current == 00010000 and next_mask == 00111000

MSN
+1  A: 

I've found the following Verilog code for implementing the task in the Altera advanced synthesis cookbook.

// 'base' is a one hot signal indicating the first request
// that should be considered for a grant.  Followed by higher
// indexed requests, then wrapping around.
//

module arbiter (
    req, grant, base
);

parameter WIDTH = 16;

input [WIDTH-1:0] req;
output [WIDTH-1:0] grant;
input [WIDTH-1:0] base;

wire [2*WIDTH-1:0] double_req = {req,req};
wire [2*WIDTH-1:0] double_grant = double_req & ~(double_req-base);
assign grant = double_grant[WIDTH-1:0] | double_grant[2*WIDTH-1:WIDTH];

endmodule

It uses subtraction (only once, though), so conceptually it's quite similar to Doug's solution.

Eli Bendersky