views:

43

answers:

1

Hi,

I'm trying to write a synthesizable 3D rasterizer in Verilog/SystemVerilog. The rasterizer right now is not really a 3D rasterizer: it just receives six 32-bits floats for vertex position (vertA_pos_x, vertA_pos_y, vertB_pos_x, vertB_pos_y, vertC_pos_x, vertC_pos_y) and nine 8-bits integers for vertex coloring (vertA_color_r, vertA_color_g, vertA_color_b, vertB_color_r, vertB_color_g, vertB_color_b, vertC_color_r, vertC_color_g, vertC_color_b).

Positions' ranges are 0.0f ~ 1.0f, 0.0f representing the top/left side of the screen, 0.5f the middle of it and 1.0f the bottom/right side.

The raster work would be to, first, count how many raster lines are required. Given that the framebuffer height is 240 pixels, vertex A is the top vertex, B is the bottom-left one, C is the bottom-right one and X is the bottommost vertex (either B or C; this has to be calculated), the number of raster lines is given by (vertX_pos_y - vertA_pos_y) / 240.

This part of the rasterization process is complex enough to expose my doubts, so I'll stop explaining how I would proceed here.

Now what I want to know is how to implement such "complex" logic in Verilog (it is "complex" because it is sequential and takes more than one clock cycle, which is not exactly the most pleasant kind of thing to design with a hardware description language).

I am using Altera's Quartus and so I'm mainly interested in Altera solutions.

The floating-point operation megafunctions that come with Quartus all require more than one clock cycle to finish, so, to implement "simple" calculations like (vertX_pos_y - vertA_pos_y) / 240, I'm assuming a fairly boring-to-write and error-prone state machine is necessary. My biggest expectation is that someone will tell me I don't need that, but if that's not the case, I still would like to know how people generally design things like these.

Also notice that I'm very new to Verilog and hardware design in general, so I'm sorry if I say something stupid. Ideas?

+4  A: 

Have you heard of pipelining? This is how datapaths are often constructed.

To give an example, say you wanted to do (a*b) + c, where x*y takes 3 clock cycles and x+y takes 1 clock cycle. Pipelining simply means inserting banks of registers to line up the delays. In the example, the input c is delayed to match up with the latency of the multiply. So overall, the operation will have a latency of 3 + 1 = 4 clock cycles.

Now, if you need to do lots of calculations, the pipeline delays can be 'legoed' together so that you don't need state machine logic to schedule your math operations. It will mean that you'll have to wait a few cycles to get your answer (ie latency) - which is unavoidable really in synchronous designs.

Marty
I have just recently (yesterday?) heard of pipelining, indeed. I'm very interested in what you meant by "legoing" pipeline delays together to avoid state machine logic. I really dislike having to handle state machines just for that.
n2liquid
By legoing, I just mean that I like to think of each type of math operation in the pipeline as a different colour of brick (eg, red=mult, white=delay, blue=add), and that each bump on the lego block as a clock cycle of latency. Then as I build the datapath out of the blocks, I think about the gaps, and consider whether I can snap the datapath together in a different way to do more parts of the calculation in parallel rather than put in the delay registers.
Marty
I can understand your way of thinking, and I thought about a similar analogy when I heard of pipelining. But I have never seen an example of this in action. Could you link me some or edit your answer with one of yourself? I'd be most grateful, and your answer would be more complete.
n2liquid