tags:

views:

209

answers:

2

Hi, Shark has identified a area of code to be improved - Unaligned loop start and recommends adding -falign-loops=16 (gcc compiler flag). I've added this to Other C flags in iphone Xcode both to the dependant project and top level project. However it still does not seem to affect the performance and Shark is still reporting the same problem so it appears it didn't work.

Am i doing this correctly?

+1  A: 

Are you compiling C files or C++ files? If C++ then you might want to check they make it through to the Other C++ flags setting.

In the xcode build results window there is button to show the build transcript, and you can use this to make sure the compiler flags are making it through to gcc

David Sykes
Hi, thanks it seems that they are getting through to the command line anyway although not sure why it has no effect :(
tech74
+1  A: 

The hints from Shark are not always helpful or appropriate - you have to make the final decision - if your loop is quite small then loop alignment might make a small difference, but there are probably much more important things that you would look at optimising before that.

Paul R
Well i was at least hoping that after applying the compiler flag that Shark recommnded that it would at least not complain about that issue.Even if it did not improve performance significantly, the problem i have is Shark is still recommending i use the -falign-loops=16 flag even though i already have.
tech74
Is the loop start address now 16-byte aligned ? There should be NOPs prior to the start of the loop to make this happen. It it's not 16-byte aligned then it may be that the compiler is ignoring the `-falign-loops=16` switch or perhaps some other problem.
Paul R
Hi, some are, some aren't. The code is doing floating point maths and its taking too long. Shark is also complaining of use of single FP LDM/STM instructions and suggesting using multiple LDM/STM.Here is the loop to be optimizedint i;float *a;float object[40];float val;for (i=0; i<num; i++) { val += object[i]*(*a++); }i need this as fast as possible on iphone
tech74
@tech74: this looks look a good candidate for implementation with NEON (SIMD) - you should be able to get at least 2x improvement - just use the NEON intrinsics in gcc (no need for asm).
Paul R
Hi, Is Neon available on all iphones ie 2G,3G and 3GSI,m struggling to convert that code to NEON , is there a good tutorial somewhere
tech74
I believe Neon is available on all of the older iPhones, but I guess you could always test for this and drop back to scalar code if it's not available. There's a good example of using Neon via gcc intrinsics here: http://hilbert-space.de/?p=22
Paul R
s'{standard input}:182:selected processor does not support `fconsts s14,#96'{standard input}:183:bad instruction `vmul.f32 d7,d6,d7'I'm getting the following errors when i compile with these flags below, i'm using iphone sdk 3.1-mcpu=cortex-a8 -mfpu=neon -ftree-vectorize -mfloat-abi=(softfp|hard) -ffast-math -fsingle-precision-constant
tech74
@tech74: sorry, I'm not an expert on iPhone development - this looks like a problem with the Xcode toolchain, but it's not really my area. I believe the 3.1 SDK is quite old though - is there some reason why you can't use the latest SDK ?
Paul R