views:

191

answers:

5

Here's a problem thats got me stumped (solution wise):

Given a str S, apply character mappings Cm = {a=(m,o,p),d=(q,u),...} and print out all possible combinations using C or C++.

The string can be any length, and the number of character mappings varies, and there won't be any mappings that map to another map (thus avoiding circular dependencies).

As an example: string abba with mappings a=(e,o), d=(g,h), b=(i) would print:

abba,ebba,obba,abbe,abbo,ebbe,ebbo,obbe,obbo,aiba,aiia,abia,eiba,eiia,...... 
A: 

You essentially want to do a depth-first search (DFS) or any other traversal down a directed acyclic word graph (DAWG). I will post some code shortly.

Peter Alexander
+1  A: 

The way I would go about this is to create an array of indexes the same length as the string, all initialized at zero. We then treat this array of indexes as a counter to enumerate all the possible mappings of our source string. A 0 index maps that position in the string to the first mapping for that character, a 1 to the second, etc. We can step through them in order by just incrementing the last index in the array, carrying over to the next position when we reach the maximum number of mappings for that position.

To use your example, we have the mappings

'a' => 'e', 'o'
'b' => 'i'

With the input string "abba", we need a four element array for our indexes:

[0,0,0,0] => "abba"
[0,0,0,1] => "abbe"
[0,0,0,2] => "abbo"
[0,0,1,0] => "abia"
[0,0,1,1] => "abie"
[0,0,1,2] => "abio"
[0,1,0,0] => "aiba"
[0,1,0,1] => "aibe"
[0,1,0,2] => "aibo"
[0,1,1,0] => "aiia"
[0,1,1,1] => "aiie"
[0,1,1,2] => "aiio"
[1,0,0,0] => "ebba"
[1,0,0,1] => "ebbe"
[1,0,0,2] => "ebbo"
[1,0,1,0] => "ebia"
[1,0,1,1] => "ebie"
[1,0,1,2] => "ebio"
[1,1,0,0] => "eiba"
[1,1,0,1] => "eibe"
[1,1,0,2] => "eibo"
[1,1,1,0] => "eiia"
[1,1,1,1] => "eiie"
[1,1,1,2] => "eiio"
[2,0,0,0] => "obba"
[2,0,0,1] => "obbe"
[2,0,0,2] => "obbo"
[2,0,1,0] => "obia"
[2,0,1,1] => "obie"
[2,0,1,2] => "obio"
[2,1,0,0] => "oiba"
[2,1,0,1] => "oibe"
[2,1,0,2] => "oibo"
[2,1,1,0] => "oiia"
[2,1,1,1] => "oiie"
[2,1,1,2] => "oiio"

Before we start generating these strings, we're going to need somewhere to store them, which in C, means that we're going to have to allocate memory. Fortunately, we know the length of these strings already, and we can figure out the number of strings we're going to generate - it's just the product of the number of mappings for each position.

While you can return them in an array, I prefer to use a callback to return them as I find them.

#include <string.h>
#include <stdlib.h>
int each_combination( 
    char const * source, 
    char const * mappings[256],
    int (*callback)(char const *, void *), 
    void * thunk
) {
  if (mappings == NULL || source == NULL || callback == NULL )
  {
    return -1;
  }
  else
  {
    size_t i;
    int rv;
    size_t num_mappings[256] = {0};
    size_t const source_len = strlen(source);
    size_t * const counter = calloc( source_len, sizeof(size_t) );
    char * const scratch = strdup( source );

    if ( scratch == NULL || counter == NULL )
    {
      rv = -1;
      goto done;
    }

    /* cache the number of mappings for each char */
    for (i = 0; i < 256; i++)
      num_mappings[i] = 1 + (mappings[i] ? strlen(mappings[i]) : 0);

    /* pass each combination to the callback */
    do {
      rv = callback(scratch, thunk);
      if (rv != 0) goto done;

      /* increment the counter */
      for (i = 0; i < source_len; i++)
      {
        counter[i]++;
        if (counter[i] == num_mappings[(unsigned char) source[i]])
        {
          /* carry to the next position */
          counter[i] = 0;
          scratch[i] = source[i];
          continue;
        }
        /* use the next mapping for this character */
        scratch[i] = mappings[(unsigned char) source[i]][counter[i]-1];
        break;
      }
    } while(i < source_len);

done:
    if (scratch) free(scratch);
    if (counter) free(counter);
    return rv;
  }
}
#include <stdio.h>
int print_each( char const * s, void * name)
{
    printf("%s:%s\n", (char const *) name, s);
    return 0;
}
int main(int argc, char ** argv)
{
  char const * mappings[256] = { NULL };
  mappings[(unsigned char) 'a'] = "eo";
  mappings[(unsigned char) 'b'] = "i";

  each_combination( "abba", mappings, print_each, (void *) "abba");
  each_combination( "baobab", mappings, print_each, (void *) "baobab");

  return 0;
}
rampion
While it works, it always makes me cringe to see `malloc` and `free` all over the place, especially when NOT paired in the same method... Also (but it's typical with english speakers) some `char` may have a negative value (for accentuated characters in the extended ASCII).
Matthieu M.
Thanks for calling me on using signed characters as array indexes. Fixed that. As for returning allocated memory - I prefer to have the caller allocate memory, but in cases like this, where figuring out how much memory to allocate is a big part of the calculation, it seems silly to expect the caller to figure that out.
rampion
Actually, know what? I'm going to rewrite this with a callback, since I like that better.
rampion
+1  A: 

Definitely possible, not really difficult... but this will generate lots of strings that's for sure.

The first thing to remark is that you know how many strings it's going to generate beforehand, so it's easy to do some sanity check :)

The second: it sounds like a recursive solution would be easy (like many traversal problems).

class CharacterMapper
{
public:
  CharacterMapper(): mGenerated(), mMapped()
  {
    for (int i = -128, max = 128; i != max; ++i)
      mMapped[i].push_back(i); // 'a' is mapped to 'a' by default
  }

  void addMapped(char origin, char target)
  {
    std::string& m = mMapped[origin];
    if (m.find(target) == std::string::npos) m.push_back(target);
  } // addMapped

  void addMapped(char origin, const std::string& target)
  {
    for (size_t i = 0, max = target.size(); i != max; ++i) this->addMapped(origin, target[i]);
  } // addMapped

  void execute(const std::string& original)
  {
    mGenerated.clear();
    this->next(original, 0);
    this->sanityCheck(original);
    this->print(original);
  }

private:
  void next(std::string original, size_t index)
  {
    if (index == original.size())
    {
      mGenerated.push_back(original);
    }
    else
    {
      const std::string& m = mMapped[original[index]];
      for (size_t i = 0, max = m.size(); i != max; ++i)
        this->next( original.substr(0, index) + m[i] + original.substr(index+1), index+1 );
    }
  } // next

  void sanityCheck(const std::string& original)
  {
    size_t total = 1;
    for (size_t i = 0, max = original.size(); i != max; ++i)
      total *= mMapped[original[i]].size();

    if (total != mGenerated.size())
      std::cout << "Failure: should have found " << total << " words, found " << mGenerated.size() << std::endl;
  }

  void print(const std::string& original) const
  {
    typedef std::map<char, std::string>::const_iterator map_iterator;
    typedef std::vector<std::string>::const_iterator vector_iterator;

    std::cout << "Original: " << original << "\n";

    std::cout << "Mapped: {";
    for (map_iterator it = mMapped.begin(), end = mMapped.end(); it != end; ++it)
      if (it->second.size() > 1) std::cout << "'" << it->first << "': '" << it->second.substr(1) << "'";
    std::cout << "}\n";

    std::cout << "Generated:\n";
    for (vector_iterator it = mGenerated.begin(), end = mGenerated.end(); it != end; ++it)
      std::cout << "  " << *it << "\n";
  }

  std::vector<std::string> mGenerated;
  std::map<char, std::string> mMapped;
}; // class CharacterMapper


int main(int argc, char* argv[])
{
  CharacterMapper mapper;
  mapper.addMapped('a', "eo");
  mapper.addMapped('d', "gh");
  mapper.addMapped('b', "i");
  mapper.execute("abba");
}

And here is the output:

Original: abba
Mapped: {'a': 'eo''b': 'i''d': 'gh'}
Generated:
  abba
  abbe
  abbo
  abia
  abie
  abio
  aiba
  aibe
  aibo
  aiia
  aiie
  aiio
  ebba
  ebbe
  ebbo
  ebia
  ebie
  ebio
  eiba
  eibe
  eibo
  eiia
  eiie
  eiio
  obba
  obbe
  obbo
  obia
  obie
  obio
  oiba
  oibe
  oibo
  oiia
  oiie
  oiio

Yeah, rather lengthy, but there's a lot that does not directly participate to the computation (initialization, checks, printing). The core methods is next which implements the recursion.

Matthieu M.
A: 

There is a link to the snippets archive which does that, here, Permute2.c. There is another variant of the string permutation (I guess you could then filter out those that are not in the map!) See here on the 'snippets' archive...

Hope this helps, Best regards, Tom.

tommieb75
+2  A: 

EDIT: This should be the fastest and simplest possible algo. Some may argue with the style or portability; I think this is perfect for an embedded-type thing and I've spent long enough on it already. I'm leaving the original below.

This uses an array for mapping. The sign bit is used to indicate the end of a mapping cycle, so the array type has to be larger than the mapped type if you want to use the full unsigned range.

Generates 231M strings/sec or ~9.5 cycles/string on a 2.2GHz Core2. Testing conditions and usage as below.

#include <iostream>
using namespace std;

int const alphabet_size = CHAR_MAX+1;
typedef int map_t; // may be char or short, small performance penalty
int const sign_bit = 1<< CHAR_BIT*sizeof(map_t)-1;
typedef map_t cmap[ alphabet_size ];

void CreateMap( char *str, cmap &m ) {
    fill( m, m+sizeof(m)/sizeof(*m), 0 );
    char *str_end = strchr( str, 0 ) + 1;
    str_end[-1] = ' '; // space-terminated strings
    char prev = ' ';
    for ( char *pen = str; pen != str_end; ++ pen ) {
        if ( * pen == ' ' ) {
            m[ prev ] |= sign_bit;
            prev = 0;
        }
        m[ * pen ] = * pen;
        if ( prev != ' ' ) swap( m[prev], m[ *pen ] );
        prev = *pen;
    }
    for ( int mx = 0; mx != sizeof(m)/sizeof(*m); ++ mx ) {
        if ( m[mx] == 0 ) m[mx] = mx | sign_bit;
    }
}

bool NextMapping( char *s, char *s_end, cmap &m ) {
    for ( char *pen = s; pen != s_end; ++ pen ) {
        map_t oldc = *pen, newc = m[ oldc ];
        * pen = newc & sign_bit-1;
        if ( newc >= 0 ) return true;
    }
    return false;
}

int main( int argc, char **argv ) {
    uint64_t cnt = 0;
    cmap m;
    CreateMap( argv[1], m );
    char *s = argv[2], *s_end = strchr( s, 0 );
    do {
        ++ cnt;
    } while ( NextMapping( s, s_end, m ) );
    cerr << cnt;
    return 0;
}

ORIGINAL:

Not as short or robust as I'd like, but here's something.

  • Requires that the input string always contain the alphabetically first letter in each replacement set
  • Execute a la maptool 'aeo dgh bi' abbd
  • Output is in reverse-lexicographical order
  • Performance of about 22 cycles/string (100M strings/sec at 2.2 GHz Core2)
    • BUT my platform is trying to be clever with strings, slowing it down
    • If I change it to use char* strings instead, it runs at 142M strings/sec (~15.5 cycles/string)
  • Should be possible to go faster using a char[256] mapping table and another char[256] specifying which chars end a cycle.

The map data structure is an array of nodes linked into circular lists.

#include <iostream>
#include <algorithm>
using namespace std;

enum { alphabet_size = UCHAR_MAX+1 };

struct MapNode {
     MapNode *next;
     char c;
     bool last;
     MapNode() : next( this ), c(0), last(false) {}
};

void CreateMap( string s, MapNode (&m)[ alphabet_size ] ) {
    MapNode *mprev = 0;
    replace( s.begin(), s.end(), ' ', '\0' );
    char *str = const_cast<char*>(s.c_str()), *str_end = str + s.size() + 1;
    for ( char *pen = str; pen != str_end; ++ pen ) {
        if ( mprev == 0 ) sort( pen, pen + strlen( pen ) );
        if ( * pen == 0 ) {
            if ( mprev ) mprev->last = true;
            mprev = 0;
            continue;
        }
        MapNode &mnode = m[ * pen ];
        if ( mprev ) swap( mprev->next, mnode.next ); // link node in
        mnode.c = * pen; // tell it what char it is
        mprev = &mnode;
    }
       // make it easier to tell that a node isn't in any map
    for ( MapNode *mptr = m; mptr != m + alphabet_size; ++ mptr ) {
        if ( mptr->next == mptr ) mptr->next = 0;
    }
}

bool NextMapping( string &s, MapNode (&m)[ alphabet_size ] ) {
    for ( string::iterator it = s.begin(); it != s.end(); ++ it ) {
        MapNode &mnode = m[ * it ];
        if ( mnode.next ) {
            * it = mnode.next->c;
            if ( ! mnode.last ) return true;
        }
    }
    return false;
}

int main( int argc, char **argv ) {
    MapNode m[ alphabet_size ];
    CreateMap( argv[1], m );
    string s = argv[2];
    do {
        cerr << s << endl;
    } while ( NextMapping( s, m ) );
 return 0;
}
Potatoswatter