views:

114

answers:

4

I have a char[26] of the letters a-z and via nested for statements I'm producing a list of sequences like:

aaa, aaz... aba, abb, abz, ... zzy, zzz.

Currently, the software is written to generate the list of all possible values from aaa-zzz and then maintains an index, and goes through each of them performing an operation on them.

The list is obviously large, it's not ridiculously large, but it's gotten to the point where the memory footprint is too large (there are also other areas being looked at, but this is one that I've got).

I'm trying to produce a formula where I can keep the index, but do away with the list of sequences and calculate the current sequence based on the current index (as the time between operations between sequences is long).

Eg:

char[] characters = {a, b, c... z};
int currentIndex = 29; // abd

public string CurrentSequence(int currentIndex)
{
    int ndx1 = getIndex1(currentIndex); // = 0
    int ndx2 = getIndex2(currentIndex); // = 1
    int ndx3 = getIndex3(currentIndex); // = 3

    return string.Format(
        "{0}{1}{2}", 
        characters[ndx1], 
        characters[ndx2], 
        characters[ndx3]); // abd
}

I've tried working out a small example using a subset (abc) and trying to index into that using modulo division, but I'm having trouble thinking today and I'm stumped.

I'm not asking for an answer, just any kind of help. Maybe a kick in the right direction?

+7  A: 

Hint: Think of how you would print a number in base 26 instead of base 10, and with letters instead of digits. What's the general algorithm for displaying a number in an arbitrary base?

Spoiler: (scroll right to view)

                                                                                      int ndx1 = currentIndex / 26 / 26 % 26;
                                                                                      int ndx2 = currentIndex / 26 % 26;
                                                                                      int ndx3 = currentIndex % 26;
John Kugelman
+4  A: 

Something like this ought to work, assuming 26 characters:

public string CurrentSequence(int currentIndex) {
    return characters[currentIndex / (26 * 26)] 
        + characters[(currentIndex / 26) % 26]
        + characters[currentIndex % 26];
}
recursive
+4  A: 

Wow, two questions in one day that can be solved via Cartesian products. Amazing.

You can use Eric Lippert's LINQ snippet to generate all combinations of the index values. This approach results in a streaming set of values, so they don't require storage in memory. This approach nicely separates the logic of generating the codes from maintaining state or performing computation with the code.

Eric's code for all combinations:

static IEnumerable<IEnumerable<T>> CartesianProduct<T>(this IEnumerable<IEnumerable<T>> sequences)  
{  
  IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };  
  return sequences.Aggregate(  
    emptyProduct,  
    (accumulator, sequence) =>   
      from accseq in accumulator   
      from item in sequence   
      select accseq.Concat(new[] {item}));                 
} 

You can now write:

public static IEnumerable<string> AllCodes()
{
  char[] characters = {a, b, c... z}; 
  IEnumerable<char[]> codeSets = new[] { characters, characters, characters };

  foreach( var codeValues in codeSets.CartesianProduct() )
  {
    yield return 
       string.Format( "{0}{1}{2}", codeValues[0], codeValues[1], codeValues[2]);
  }
}

The code above generates a streaming sequence of all code strings from aaa to zzz. You can now use this elsewhere where you perform your processing:

foreach( var code in AllCodes() )
{
    // use the code value somehow...
}
LBushkin
That solution doesn't allow you to lookup an index efficiently, which is the use case.
recursive
@recursive. Perhaps. Excepy the OP did indicate that the software works its way through all of the indexes anyways. So this may still be helpful.
LBushkin
@LBushkin, recursive is correct but it indeed is quite useful. I had seen Eric's post about that, but had forgotten it. +1
SnOrfus
I wouldn't say looking up an index is the use case. I read it as the OPs attempt to solve the big-list-in-memory problem, and I think using a lazy enumerator is a preferable solution, given the requirements such as they are defined in the question.
Jay
+4  A: 

There are multiple ways to solve your problem, but an option is to generate the sequence on the fly instead of storing it in a list:

IEnumerable<String> Sequence() {
  for (var c1 = 'a'; c1 <= 'z'; ++c1)
    for (var c2 = 'a'; c2 <= 'z'; ++c2)
      for (var c3 = 'a'; c3 <= 'z'; ++c3)
        yield return String.Format("{0}{1}{2}", c1, c2, c3);
}

You can then enumerate all the strings:

foreach (var s in Sequence())
  Console.WriteLine(s);

This code doesn't use indices at all but it allows you to create a loop around the sequence of strings using simple code and without storing the strings.

Martin Liversage