The solution below assumes that the lower 16 bits of the Unicode space will be enough for you. If your bitmap table has, say U+0020 through U+007E at positions 0x00 to 0x5E and U+00A0 through U+00FF at positions 0x5F to 0xBE and U+1200 through U+1241 at 0xBF to 0xFF, you could do something like the code below (which isn't tested, not even compile-tested).
bitmapmap contains a series of pairs of values. The first value in the first pair is the Unicode code point which the bitmap at index 0 represents. The assumption is that the bitmap table contains a series of directly adjacent Unicode code points. So the second value says how long this series is.
The first part of the while loop iterates through UTF-8 input and builds up a Unicode code point in ucs2char. Once a complete character is found, the second part searches for that character in one of the ranges mentioned in bitmapmap. If it finds an appropriate bitmap index, it adds it to indexes. Characters for which no bitmap is present are silently dropped.
The function returns the number of bitmap indexes found.
This way of doing things should be memory-efficient in terms of the unicode->bitmap table, reasonably fast and reasonably flexible.
// Code below assumes C99, but is about three cut-and-pastes from C89
// Assuming an unsigned short is 16-bit
unsigned short bitmapmap[]={0x0020, 0x005E,
0x00A0, 0x0060,
0x1200, 0x0041,
0x0000};
int utf8_to_bitmap_indexes(unsigned char *utf8, unsigned short *indexes)
{
int bitmapsfound=0;
int utf8numchars;
unsigned char c;
unsigned short ucs2char;
while (*utf8)
{
c=*utf8;
if (c>=0xc0)
{
utf8numchars=0;
while (c&0x80)
{
utf8numchars++;
c<<=1;
}
c>>=utf8numchars;
ucs2char=0;
}
else if (utf8numchars && c<0x80)
{
// This is invalid UTF-8. Do our best.
utf8numchars=0;
}
if (utf8numchars)
{
c&=0x3f;
ucs2char<<=6;
ucs2char+=c;
utf8numchars--;
if (utf8numchars)
continue; // Our work here is done - no char yet
}
else
ucs2char=c;
// At this point, we have a complete UCS-2 char in ucs2char
unsigned short bmpsearch=0;
unsigned short bmpix=0;
while (bitmapmap[bmpsearch])
{
if (ucs2char>=bitmapmap[bmpsearch] && ucs2char<=bitmapmap[bmpsearch]+bitmapmap[bmpsearch+1])
{
*indexes++ = bmpix+(ucs2char-bitmapmap[bmpsearch]);
bitmapsfound++;
break;
}
bmpix+=bitmapmap[bmpsearch+1];
bmpsearch+=2;
}
}
return bitmapsfound;
}
EDIT: You mentioned that you need more than the lower 16 bits. s/unsigned short/unsigned int/;s/ucs2char/codepoint/; in the above code and it can then do the whole Unicode space.