views:

76

answers:

2

Sometimes, when using macros to generate code, it is necessary to create identifiers that have global scope but which aren't really useful for anything outside the immediate context where they are created. For example, suppose it's necessary to compile-time-allocate an array or other indexed resource into chunks of various sizes.

/* Produce an enumeration of some story-book characters, and allocate some
   arbitrary index resource to them.  Both enumeration and resource indices
   will start at zero.

   For each name, defines HPID_xxxx to be the enumeration of that name.
   Also defines HP_ID_COUNT to be the total number of names, and
   HP_TOTAL_SIZE to be the total resource requirement, and creates an
   array hp_starts[HP_ID_COUNT+1].  Each character n is allocated resources
   from hp_starts[n] through (but not including) hp_starts[n+1].
*/

/* Give the names and their respective lengths */
#define HP_LIST \
  HP_ITEM(FRED, 4) \
  HP_ITEM(GEORGE, 6) \
  HP_ITEM(HARRY, 5) \
  HP_ITEM(RON, 3) \
  HP_ITEM(HERMIONE, 8) \
  /* BLANK LINE REQUIRED TO ABSORB LAST BACKSLASH */

#define HP_ITEM(name, length) HPID_##name,
typedef enum { HP_LIST HP_ID_COUNT} HP_ID;
#undef HP_ITEM

#define HP_ITEM(name, length) ZZQ_##name}; enum {ZZQX_##name=ZZQ_##name+(length)-1,
enum { HP_LIST HP_TOTAL_SIZE};
#undef HP_ITEM

#define HP_ITEM(name, length) ZZQ_##name,
const unsigned char hp_starts[] = { HP_LIST HP_TOTAL_SIZE};
#undef HP_ITEM

#include "stdio.h"

void main(void)
{
  int i;

  printf("ID count=%d Total size=%d\n",HP_ID_COUNT,HP_TOTAL_SIZE);
  for (i=0; HP_ID_COUNT > i; i++) /* Reverse conditional to avoid lt sign */
    printf("  %2d=%3d/%3d\n", i, hp_starts[i], hp_starts[i+1]-hp_starts[i]);
  printf("IDs are: \n");
#define HP_ITEM(name, length) printf("  %2d=%s\n",HPID_##name, #name);
  HP_LIST
#undef HP_ITEMS

}

Is there any normal convention for naming such identifiers to minimize the likelihood of conflicts, and also to minimize any confusion they might generate? In the above scenario, identifiers ZZQ_xxx will be the same as hp_starts[HPID_xxx], and might in some contexts be useful, though their primary purpose is to build the array and serve as placeholders in computing other ZZQ values and HP_TOTAL_SIZE. Identifiers ZZQX_xxx are useless, however; their sole purpose is to serve as placeholders when set the enumeration values for the succeeding items. Is there any good way to name such things?

Incidentally, I develop for small microcontrollers were RAM is at a greater premium than code space. Code is simulated by compiling on Microsoft VC++, but for production is compiled using a cross-compiler in straight C; code must thus compile in both C and C++.

Are there any other preprocessor tricks people can recommend for similar tasks?

A: 

You might check the preprocessor macros in the boost project. They have clever preprocessor counters. Together with __LINE__ you may use this to generate unique identifiers that only depend on the line where they are expanded. This could help you to avoid a redefinition of your HP_ITEM macro.

Jens Gustedt
+1  A: 

Is there any normal convention for naming such identifiers to minimize the likelihood of conflicts, and also to minimize any confusion they might generate?

It all boils down to the prefix you want to use. Ideally, one would want all the symbols to be easily associated with the list (HP_LIST) they are related to.

So why not to put the symbols under the same HP_ prefix? E.g. prefix HP__ZZQX_, to differentiate between the useful and the useless symbols.

N.B. I have worked on a project where one of the shared libraries is already using (internally) zzqx_ prefix, it was always showing up in the application's symbol table at the end. In the race for unlikely-to-be-used names, apparently many people take the same route (end of the latin alphabet) and end up with precisely same names. The opposite of the desired result. That is why I think that namespaces (or in C the symbol prefixes) should not be hidden/burried in the defines, but rather explicitly defined (e.g. easy to find and extract).

And as something concrete, here is your source enhanced with the hack around ## to generate the names using the prefix given as a preprocessor define:


/* the hack is needed to force the LIST_NAME to be expanded. 
   automatically adds underscores. yes, it's ugly */
#define LIST_SYMBOL_1(n1,n2,n3) n1##_##n2##_##n3
#define LIST_SYMBOL_0(n1,n2,n3) LIST_SYMBOL_1(n1,n2,n3)
#define LIST_SYMBOL(pref,name)  LIST_SYMBOL_0(LIST_NAME,pref,name)

/* give the name to the list. used by the LIST_SYMBOL(). */
#define LIST_NAME   HP

/* Give the names and their respective lengths */
#define HP_LIST \
  HP_ITEM(FRED, 4) \
  HP_ITEM(GEORGE, 6) \
  HP_ITEM(HARRY, 5) \
  HP_ITEM(RON, 3) \
  HP_ITEM(HERMIONE, 8) \
  /* BLANK LINE REQUIRED TO ABSORB LAST BACKSLASH */

#define HP_ITEM(name, length) HPID_##name,
typedef enum { HP_LIST HP_ID_COUNT} HP_ID;
#undef HP_ITEM

#define HP_ITEM(name, length)   LIST_SYMBOL(ZZQ,name)}; \
    enum {LIST_SYMBOL(ZZQX,name)=LIST_SYMBOL(ZZQ,name)+(length)-1,
enum { HP_LIST HP_TOTAL_SIZE};
#undef HP_ITEM

#define HP_ITEM(name, length) LIST_SYMBOL(ZZQ,name),
const unsigned char hp_starts[] = { HP_LIST HP_TOTAL_SIZE};
#undef HP_ITEM

#include <stdio.h>

void main(void)
{
  int i;
  printf("ID count=%d Total size=%d\n",HP_ID_COUNT,HP_TOTAL_SIZE);
  for (i=0; i<HP_ID_COUNT ; i++) /* bring the < back, SO is smart enough */
    printf("  %2d=%3d/%3d\n", i, hp_starts[i], hp_starts[i+1]-hp_starts[i]);
  printf("IDs are: \n");
#define HP_ITEM(name, length) printf("  %2d=%s\n",HPID_##name, #name);
  HP_LIST
#undef HP_ITEMS
}

Edit 1. My prefered approach is to put the data into a proper text file, e.g.:

FRED
GEORGE
HARRY
RON
HERMIONE

(note that you do not need length anymore) and write a script (or even a trivial C program) to generate source code from the text file, creating the necessary header (with the enum + declaration of the data) and source file (with the data). Modify the Makefile to run the script before compiling any sources and add the generated source files to the list of compiled sources.

That has HUGE advantage that the generated code is a plain code and can be indexed as such (unless you love the fun of "where that darn id came from?"). The internal constants simply do not appear anymore in the source code since script handles them. And no fugly preprocessor magic anymore.

Dummy00001
@Dummy00001: I think part of the idea of putting things at the end of the alphabet is to encourage them to appear at the end of a symbol table, away from any other useful stuff. Funny that ZZQX would be used in multiple places. Perhaps ZZQX_HP_ would be a good prefix? BTW, how do you like the way the thing does math with enumerations? Ever seen that? It's sorta icky having to close and reopen the enumeration to allow the newly-issued value to be used in computing the next (needed for C, not C++), but that's a consequence of not having real math in the preprocessor.
supercat
Honestly, I had to work with several similar schemes and I like none of them. (Most popular is to put HP_LIST into a include file (without slashes) and then include it where in your code HP_LIST is used.) I prefer to put the data into a proper text file and write a script(s) to generate source code from the text file. That has *HUGE* advantage that the generated code is a plain code and can be indexed as such (or you love the fun of "where that darn id came from?"). And the internal constants simply do not appear in the source code anymore. And no fugly prepreprocessor magic.
Dummy00001
I added to the reply what I have mentioned in the comment: the way I actually prefer to do it.
Dummy00001