Hi,
Basic idea is to sort the strings and compare signature of strings, where signature is the alphabetically sorted string.
What would be the efficient algorithm to do so ?
Thanks.
Hi,
Basic idea is to sort the strings and compare signature of strings, where signature is the alphabetically sorted string.
What would be the efficient algorithm to do so ?
Thanks.
You don't specify the programming language or the language of the strings (is it ASCII, Latin1, UTF8, UTF16, etc.), but basically your compare function would need to either sort the characters in each string and then return the result based on comparison or sum the ordinal values of the characters in each string and return the result of an integer comparison between them.
If you are sorting the UTF8 characters "alphabetically", you can convert them to 32-bit integers (UTF8 chars are 1 to 4 8-bit values) and then do a RADIX sort. It will work in O(N) time. If you were using just ASCII, I would suggest Counting Sort.
There are many ways to match the signatures but I would use a Hash Table ( O(1) on average ) or a O(Lg N) structure such as Red-Black Trees or Skip-Lists.
To further speed up your string matching, you can compress these signatures by Run Length Encoding these UTF8 characters (since they're sorted, the signature will be runs + gaps). Actually, you could compress them to use bit tags that represent 7-bit chars (most common), RLE runs, and longer literals (8-bit through 32-bit chars). Comparing the compressed strings would be faster.
The question looks similar to one asked here, to which my answer was:
#define NUM_ALPHABETS 256
int alphabets[NUM_ALPHABETS];
bool isAnagram(char *src, char *dest) {
len1 = strlen(src);
len2 = strlen(dest);
if (len1 != len2)
return false;
memset(alphabets, 0, sizeof(alphabets));
for (i = 0; i < len1; i++)
alphabets[src[i]]++;
for (i = 0; i < len2; i++) {
alphabets[dest[i]]--;
if (alphabets[dest[i]] < 0)
return false;
}
return true;
}