I am trying to write a program to check that some C source code conforms to a variable naming convention. In order to do this, I need to analyse the source code and identify the type of all the local and global variables.
The end result will almost certainly be a python program, but the tool to analyse the code could either be a python module or an application that produces an easy-to-parse report. Alternatively (more on this below) it could be a way of extracting information from the compiler (by way of a report or similar). In case that's helpful, in all likelihood, it will be the Keil ARM compiler.
I've been experimenting with ctags and this is very useful for finding all of the typedefs and macro definitions etc, but it doesn't provide a direct way to find the type of variables, especially when the definition is spread over multiple lines (which I hope it won't be!).
Examples might include:
static volatile u8 var1; // should be flagged as static and volatile and a u8 (typedef of unsigned 8-bit integer)
volatile /* comments */
static /* inserted just to make life */
u8 /* difficult! */ var2 =
(u8) 72
; // likewise (nasty syntax, but technically valid C)
const uint_16t *pointer1; // flagged as a pointer to a constant uint_16t
int * const pointer2; // flagged as a constant pointer to an int
const char * const pointer3; // flagged as a constant pointer to a constant char
static MyTypedefTYPE var3; // flagged as a MyTypedefTYPE variable
u8 var4, var5, var6 = 72;
int *array1[SOME_LENGTH]; // flagged as an array of pointers to integers
char array2[FIRST_DIM][72]; // flagged as an array of arrays of type char
etc etc etc
It will also need to identify whether they're local or global/file-scope variables (which ctags can do) and if they're local, I'd ideally like the name of the function that they're declared within.
Also, I'd like to do a similar thing with functions: identify the return type, whether they're static and the type and name of all of their arguments.
Unfortunately, this is rather difficult with the C syntax since there is a certain amount of flexibility in parameter order and lots of flexibility in the amount of white space that is allowed between the parameters. I've toyed with using some fancy regular expressions to do the work, but it's far from ideal as there are so many different situations that can be applied, so the regular expressions quickly become unmanageable. I can't help but think that compilers must be able to do this (in order to work!), so I was wondering whether it was possible to extract this information. The Keil compiler seems to produce a ".crf" file for each source file that's compiled and this appears to contain all of the variables declared in that file, but it's a binary format and I can't find any information on how to parse this file. Alternatively a way of getting the information out of ctags would be perfect.
Any help that anyone can offer with this would be gratefully appreciated.
Thanks,
Al