views:

282

answers:

3

I have some C source files that are slowly expanding. I tend to keep the prototypes with documentation in the .h file in good order, grouped into relevant functions and types with #pragma mark. The code is written and documented in a way that requires reading the .h file alongside the .c file. I'd like the files to be ordered in a way that facilitates this.

Is there a way to keep the function declarations in the .c file in the same order as their prototypes in the .h file? I'm looking for a tool to read the .h file (with #pragma marks if possible) and re-order the .c file correspondingly.

Possible?

+1  A: 

I've done code shredding before. The closest thing you could get is to write one (As far as I know). Using a static analysis API, you could parse your source code, and then based off on the code in each header file organize all files in a corresponding .c file.

A company called SciTools ships a source code analyzer called 'understand 4 c++' that has a C API that makes this pretty easy. But you would probably have to write the tool yourself. As it is, I wrote a managed API that sits on top of their C API. My managed is found on codeplex here: http://understandapi.codeplex.com/

Here is how I would structure the program.

  1. First you have to create a database of all your source code. You can do this using a batch script if you want, or a powershell script, or you can manually do it yourself. It's usually as simple as pointing to a directory and in effect saying 'make a database of all the files in there'. You can determine if you want *.c, *.h, or *.cpp files in your database.

  2. Then using the API you can browse all files with the .h file extension.

  3. For each header file, you verify there is a corresponding .c file. This is done by taking a string of the filename, replacing the file extension (.NET makes this easy), and checking if the file exists. If it does exist, then on to the next step.

  4. Then the program should iterate through all defined entities in the .h file.

  5. For each entity, it then finds a reference to it's definition (not declaration), and see's if it exists in the corresponding .c file. If it's there, it finds the line numbers of the code definition, and opens the file for reading, and reads the necessary lines of code (and comments too) and writes them out to a temporary file.

  6. When completed, overwrite the .c file with the temporary file.

  7. Proceed to the rest of the files in the database.

Now it's not that easy. You may run into trouble along the way in the form of: 1. Conditionally compiled code, in which case it will make it harder to parse, though it's possible. Understand 4 c++ does parse conditional compilation directives and differentiates between inactive and active code. But just handling this would make it really difficult. 2. Namespaces - This would complicate matters.

However if you are only interested in organizing code between certain #pragma directives than it could simplify matters again.

Let me know if you are interested more, and we an talk offline privately.

C Johnson
Thanks for your response. I'm working with C not C++, so we're just talking functions, typedefs, structs, enums etc. I think if I wrote my own tool (I may do that) it would be quite simple, and follow the same kinds of steps (less the C++ complications). I write with a consistent style, so writing some Python to chop things up and put them back together as strings (not ASTs!) wouldn't be hard.
Joe
Well let's see, this tool does C too. It also does ADA, Java, C#, Fortran, and a few more besides I think. Also it runs on a host of operating systems.
C Johnson
+1  A: 
  • Use a good IDE... There will not be any need to keep the order in header file/c file aligned.

  • If that still does not suite you... Keep all declarations and definitions in alphabetical order. When you add a new function, you know where to insert the new function.

    P.S. I believe in the http://www.dmoz.org/ saying::

      Humans Do it better
    
Meera
Humans might do it better (debatable), but they are slower and way more expensive!
Christo
By 'use a good IDE' do you mean one that enables jumping around the source code? I am writing in Xcode which is fine, but I want the code to be readable in a text editor on any platform. I would rather shoulder the work on the production end to make the reader's life easier, not say 'use a good IDE'.
Joe
And regarding your second point, I'm putting functions in a certain order, for example construction/destruction of ADT, persistence, operations on ADT, etc. Imposing an artificial ordering scheme, such as alphabetical, isn't ideal.
Joe
I like this answer! The first 2 points came to mind when I considered writing my own answer. The third is the icing.
Matt Joiner
+1  A: 

I doubt you'll find a tool like this off-the-shelf. So, you'd need a custom tool. You don't want to try doing this with some string hacking method (e.g., Perl) because the details of accurately parsing C and C++ are far beyond what you can reliably do this way. If you don't mind string hacking damaging your files sometimes, maybe you can get away with this.

My company's DMS Software Reengineering Toolkit could be used to do this reliably modulo a caveat.

DMS is generic engine for parsing, analyzing, and transforming source code using compiler technology parameterized by explicit langauge definitions. DMS has robust langauge definitions for many languages, including C and C++ in variety of dialects. Using the DMS C or C++ front ends, you can parse the source code, build compiler data structures called ASTs, carry out analyses over the code, transform the ASTs, and then regenerate compilable code including comments and all the prepreprocessor directives.

The caveat has to do with parsing source code containing preprocessor directives: they have to be well-structured [eg. #ifdef #endif needs to nest around other statements just like regular if, etc. as opposed to being used across a statement boundary. This happens some in C code; much less in C++ code. Our experience is that if you are willing to modify your C code little bit, you can make this particular issue go away.

For your specific task, you do pretty much as the answer for Scientific Toolworks described:

  1. Choose a compilation unit, and parse it using DMS. You have to provide all the same information you provide the compiler, so it can locate the header files, etc.
  2. DMS produces an AST for both your compilation unit and for all header files.
  3. Walk the ASTs to extract the order of declarations in the headers and the compilation unit.
  4. Restructure the compilation unit tree according to the order derived from 3)
  5. Prettyprint the resulting compilation unit AST

[A reason to do this with DMS rather than Scientific Toolworks is that DMS is designed to parse/transform/regenerate code, whereas SciTool IMHO is really only designed to parse and analyze. DMS provides access to the fine detail required for transformation that SciTools does not, at least not the last time I looked].

Complications will ensue because of conditionals, macros, namespaces, ... but you'll have decide policy for resolution. For instance, if a header file has a #if ... #else .... #endif, and declarations in the then clause have a different order than they do in the else clause, what's the desired order? What if a function definition is created by a macro in the header? But, all this is what makes building a real tool, er, fun.

My personal opinion is this seems like rather a lot of work for the effect you are getting. If you do all of this, how much better will your software engineering process be? We normally use DMS to check for coding errors, or change the code in ways that people can't (e.g., insert runtime instrumentation temporarily or AOP-like advice), where its clear that a mechanical engine has payoff.

Ira Baxter
I'd love to get my hands on this DMS sometime. I've gotten intimately familiar with SCI's API, but I'm always looking for ways to expand my understanding of this stuff.
C Johnson