ansaurus

Question

iPhone ARMv6 VFP asm latency, throughput and hazards

Answer 1

+2 A:

Your questions are all answered in the document that you linked. You should read it carefully.

Are those numbers independent of vectorsize?

No. See, for example, Table 21-15 in the document you linked. Note the latency of the short vector FADDS.

does it mean that I can start a new FMULS operation every cycle if it doesn't depend on an earlier result that isn't available yet?

Yes, that's the definition of throughput.

what happens if I have two FMULS functions after each other where one argument depends upon the previous computation

Execution will stall until the result of the first FMULS is available. See 21.6 "Operation of the scoreboards" for more detail.

what if we are in vectormode with 4 elements and on the second FMULS instruction all inputregisters but one are available. what will happen?

It will stall. Same reference.

sqrt and division: will a sqrt or division operation prevent any subsequent operation from being started for 19 cycles?

No. See section 21.10 "Parallel Execution". An example is given in Table 21-15, in which a non-dependent FADDS executes immediately following FDIVS.

Note that it can be a bit of a challenge (though not impossible) to write short-vector VFP code that performs substantially faster than scalar code for many types of computation. Even if you learn how to do it, it will be of questionable value since the NEON unit seems to be the new model for vector computation on ARM. You may be better served in the long run by ignoring the short-vector operation for now and focusing on learning NEON for the future.

Stephen Canon 2010-01-20 15:25:47

thanks a lot for that info! Since i code for the iPhone and want to get some code running fast on the iPhone 3G i need to use the VFP, since the 3G doesn't have NEON.Yes i've read this example, but i don't really understand why it is possible in this case? page 804 suggests to avoid DIV and SQRT because it stalls both the DS and the FMACS pipeline. what exactly means 'If the short vector DS operation can be separated..." (on that page)

genesys 2010-01-20 17:19:25

ansaurus

tags:

views:

answers:

iPhone ARMv6 VFP asm latency, throughput and hazards

related questions