ansaurus

Question

ARM NEON: What's the difference between vld4_f32 and vld4q_f32?

Answer 1

+2 A:

Yes, I found out the difference. I used CodeSourcery to see the actual register contents for all the load instructions. The link I have posted doesn't give the complete details on the vld4q_f32.

Okay, first comes the vld4_f32, this loads 4 d registers (e.g. d16-19) each d register is 64 bits long, so this instruction will load the first 8 values interleaved with an interval of 4 as shown in the figure below. alt text

In the second case the vld4q_f32, this loads 8 d registers (e.g. d16-23) instead of four. For a reader of this link, it is not at all clear that 8 registers will be loaded. When I looked at the dis-assembled code for a vld4qf32, it was making use of 8 d registers.

This instruction will indeed do what I was hoping it to do i.e. to load 4 float32_t values which are at the interval of 4 as shown in the figure below. alt text

vikramtheone 2010-09-29 12:13:10

ansaurus

tags:

views:

answers:

ARM NEON: What's the difference between vld4_f32 and vld4q_f32?

related questions