ansaurus

Question

ARM NEON: How to load 8bit uint8_t as uint32_t?

Answer 1

+1 A:

Depends on your compiler and (possible lack of) extensions.

Ie. for GCC, this might be a starting point: http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html

domen 2010-09-09 11:01:41

Answer 2

+4 A:

I will recommend that you spend a bit of time understanding how SIMD works on ARM. Look at:

Take a look at:

to get you started. You can then implement your SIMD code using inline assembler or corresponding ARM intrinsics recommended by domen.

doron 2010-09-09 22:38:11

Answer 3

A:

If you need to sum up to 480 8-bit values then you would technically need 17 bits of intermediate storage. However, if you perform the additions in two stages, ie, top 240 rows then bottom 240 rows, you can do it in 16-bits each. Then you can add the results from the two halves to get the final answer.

There is actually a NEON instruction that is suitable for your algorithm called vaddw. It will add a dword vector to a qword vector, with the latter containing elements that are twice as wide as the former. In your case, vaddw.u8 can be used to add 8 pixels to 8 16-bit accumulators. Then, vaddw.u16 can be used to add the two sets of 8 16-bit accumulators into one set of 8 32-bit ones - note that you must use the instruction twice to get both halves.

If necessary, you can also convert the values back to 16-bit or 8-bit by using vmovn or vqmovn.

Exophase 2010-10-25 20:56:46

ansaurus

tags:

views:

answers:

ARM NEON: How to load 8bit uint8_t as uint32_t?

related questions