views:

97

answers:

5

Right now, I'm working on a project which requires sequential text key generation. I need to seed the key generator with an integer corresponding to a certain key, which the constructor converts to a key.

My key generator overloads the increment operators so that the string is incremented directly, rather than what I had previously been doing, which is incrementing an index value, then converting the index to a key for every key that I wanted to generate.

My problem is that I have a limited character set I want to use when generating keys. I have to find the character in the key that I want to increment, find out where it is in my character set, find the next character in the set, then replace the character in the key with the next character in the set.

Here is my code:

// Not the full charset
std::string charset = "abcdefghijklmnopqrstuvwxyz0123456789"; 
std::string key;

key.push_back(charset[0]);

for(unsigned int place = 0; place < key.length(); place++)
{
    if(key[place] == charset[charset.length() - 1])
    {
        // Overflow, reset char at place
        key[place] = charset[0];

        if((key.length() - 1) < (place + 1))
        {
            // Carry, no space, insert char
            key.insert(key.begin(), charset[0]);
            break;
        }
        else
        {
            // Space available, increment next char
            continue;
        }
    }
    else
    {
        // Increment char at place
        key[place] = charset[charset.find(key[place]) + 1];
        break;
    }
}

In profiling, I found that the search operation is really slowing things down. Is there any faster way of doing this? I thought of creating a linked list out of the character set, but before I do that, I'd like some input on this.

+3  A: 

Rather than doing a find, why don't you have a reverse translation array? The array index would be the character, and the value in the array would be its numeric value (or index into the other array).

key[place] = charset[reverse_charset[key[place]] + 1];
Mark Ransom
How is this better than not translating in the first place?
aib
Answering self: Doesn't require a modification to the algorithm. Though I think the algorithm is what's wrong in the first place.
aib
Also: Not my downvote. This answer isn't wrong and it's the simplest.
aib
The advantage to the original algorithm is that it doesn't require any information except the last key handed out. Granted, it's not a compelling advantage.
Mark Ransom
+1  A: 

You could store a vector of the same length as your key, where each element in the vector was the index in the charset of the corresponding character in the key.

For example, if key[0] was 'c', then thisVector[0] would be 2, since 'c' is the 3rd character in the character set.

Then all operations would be performed on that integer vector, removing the necessity for a find operation on the string.

Asher Dunn
Great suggestion. I tried this, and it doubled the speed.
GenTiradentes
A: 

Perhaps you would be better off working with indexes into the charset, and then converting them to actual characters when needed?

That would save you the overhead of searching for characters in the charset. And converting a charset index into a character would be a constant-time operation, unlike the inverse.

Store your key as a vector of integers 0 ~ N-1 where N is the length of your charset. Convert those integers to actual characters only when needed, i.e. after the increment.

aib
+3  A: 

This is another version of the generalized base conversion problem, with n=36.

What you want to do is view your key as an unsigned integer, and view the "string" that you're handing out as a base 36 (a-z + 0-9) representation of that key.

Handing out a key then becomes converting the "next key" value to the base36 string, then increment the next key value.

To convert, do the same thing you'd do to convert any integer to a hex representation, but swap in 36 instead of 16 on the modulo math. I'll leave this as an exercise for the reader. :)

Terry Mahaffey
This is actually what I was doing, but since it was performing a large amount of 64-bit arithmetic, it was five times slower than what I have now.
GenTiradentes
I have a hard time believing that; modern processors are really good at arthimetic, and there isn't that much to do. I'd like to see this version of the code and try to optimize that. There is a reason that hex base conversion algorithms do it the way I am suggesting and not the way you are attempting.
Terry Mahaffey
http://pastebin.com/m793cd41bLike I said, I'm dealing with very large 64-bit integers, which require a lot of arithmetic to convert to another base. When running a 32-bit build, performance is even worse, because the 64-bit arithmetic has to be done in software.
GenTiradentes
+1  A: 

I am not sure I understood what you wanted to do exactly but here is a little console program that prints out a sequence of 36*36*36 3-digit keys in base 36 using your charset as the digits. So it starts at aaa and ends at 999.

#include <stdio.h>
typedef int Number;
const size_t N = 3;
size_t B = 36;
Number key[N] = {0};
bool carry = false;
char A[] = "abcdefghifjlmnopqrstuvwxyz0123456789";

void incr(size_t i)
{
    if(!carry)
    {
        return;
    }
    ++key[i];
    if(key[i] == B)
    {
        key[i] = 0;
    }
    else
    {
        carry = false;
    }
}

void Incr()
{
    carry = true;
    size_t i = 0;
    while(carry)
    {
        incr(i++);
    }
}

void Print()
{
    for(int i = N - 1; i >= 0; --i)
    {
        printf("%c", A[key[i]]);
    }
    printf("\n");
}

int _tmain(int argc, _TCHAR* argv[])
{
    for(int i = 0; i < B * B * B; ++i)
    {
        Print();
        Incr();

    }
    return 0;
}
Permaquid