views:

677

answers:

8

Given any iterable, for example: "ABCDEF"

Treating it almost like a numeral system as such:

A B C D E F AA AB AC AD AE AF BA BB BC .... FF AAA AAB ....

How would I go about finding the ith member in this list? Efficiently, not by counting up through all of them. I want to find the billionth (for example) member in this list. I'm trying to do this in python and I am using 2.4 (not by choice) which might be relevant because I do not have access to itertools.

Nice, but not required: Could the solution be generalized for pseudo-"mixed radix" system?

--- RESULTS ---

# ------ bpowah -----
def f0(x, alph='ABCDE'):
    result = ''
    ct = len(alph)
    while x>=0:
        result += alph[x%ct]
        x /= ct-1
    return result[::-1]

# ----- Glenn Maynard -----
import math
def idx_to_length_and_value(n, length):
    chars = 1
    while True:
        cnt = pow(length, chars)
        if cnt > n:
            return chars, n

        chars += 1
        n -= cnt

def conv_base(chars, n, values):
    ret = []
    for i in range(0, chars):
        c = values[n % len(values)]
        ret.append(c)
        n /= len(values)

    return reversed(ret)

def f1(i, values = "ABCDEF"):
    chars, n = idx_to_length_and_value(i, len(values))
    return "".join(conv_base(chars, n, values))

# -------- Laurence Gonsalves ------
def f2(i, seq):
    seq = tuple(seq)
    n = len(seq)
    max = n # number of perms with 'digits' digits
    digits = 1
    last_max = 0
    while i >= max:
        last_max = max
        max = n * (max + 1)
        digits += 1
    result = ''
    i -= last_max
    while digits:
        digits -= 1
        result = seq[i % n] + result
        i //= n
    return result

# -------- yairchu -------
def f3(x, alphabet = 'ABCDEF'):
    x += 1 # Make us skip "" as a valid word
    group_size = 1
    num_letters = 0
    while 1: #for num_letters in itertools.count():
        if x < group_size:
            break
        x -= group_size
        group_size *= len(alphabet)
        num_letters +=1
    letters = []
    for i in range(num_letters):
        x, m = divmod(x, len(alphabet))
        letters.append(alphabet[m])
    return ''.join(reversed(letters))

# ----- testing ----
import time
import random
tries = [random.randint(1,1000000000000) for i in range(10000)]
numbs = 'ABCDEF'

time0 = time.time()
s0 = [f1(i, numbs) for i in tries]
print 's0 bpowah',time.time()-time0, 'sec'
time0 = time.time()
s1 = [f1(i, numbs) for i in tries]
print 's1 Glenn Maynard',time.time()-time0, 'sec'
time0 = time.time()
s2 = [f2(i, numbs) for i in tries]
print 's2 Laurence Gonsalves',time.time()-time0, 'sec'
time0 = time.time()
s3 = [f3(i,numbs) for i in tries]
print 's3 yairchu',time.time()-time0, 'sec'

times:

s0 bpowah 0.470999956131 sec
s1 Glenn Maynard 0.472999811172 sec
s2 Laurence Gonsalves 0.259000062943 sec
s3 yairchu 0.325000047684 sec
>>> s0==s1==s2==s3
True
+4  A: 

Third time's the charm:

def perm(i, seq):
  seq = tuple(seq)
  n = len(seq)
  max = n # number of perms with 'digits' digits
  digits = 1
  last_max = 0
  while i >= max:
    last_max = max
    max = n * (max + 1)
    digits += 1
  result = ''
  i -= last_max
  while digits:
    digits -= 1
    result = seq[i % n] + result
    i //= n
  return result
Laurence Gonsalves
Doesn't work:>>> for i in range(200):... print f(i,'ABCDE')... BCDEBABBBCBDBECACBCCCDCEDADBDCDDDEEAEBECEDEEBAABABBACBAD
bpowah
Yeah, I hadn't realized the implication of sequences like "AA" before. I rewrote it to handle those cases properly.
Laurence Gonsalves
sorry, it still chokes at FFF to AAAA (gives AAA instead)
bpowah
Ok, this time there was a genuine bug. The way I was increasing "max" was all wrong. Should be fixed now.
Laurence Gonsalves
+3  A: 

What you're doing is close to a conversion from base 10 (your number) to base 6, with ABCDEF being your digits. The only difference is "AA" and "A" are different, which is wrong if you consider "A" the zero-digit.

If you add the next greater power of six to your number, and then do a base conversion to base 6 using these digits, and finally strip the first digit (which should be a "B", i.e. a "1"), you've got the result.

I just want to post an idea here, not an implementation, because the question smells a lot like homework to me (I do give the benefit of the doubt; it's just my feeling).

balpha
Nope. If that was the case, "A" and "AA" would represent the same value, like 0 and 00.
Glenn Maynard
And with the constraint that normal algorithms won't work.
Nikhil Chelliah
I stand corrected, I'll update my answer.
balpha
@Nikhil Chelliah: I don't quite understand, could you elaborate?
balpha
I'd assume Nikhil Chelliah means, that numeric systems usually know a "0" digit. You system, similar to Roman numerals, does not.
lexu
@lexu: Yes, it does. "A" is the zero.
balpha
If "A" is zero, then so are "AA", "AAA", "AAAA", etc.
Nosredna
That's right. So?
balpha
+3  A: 

Multi-radix solution at the bottom.

import math
def idx_to_length_and_value(n, length):
    chars = 1
    while True:
        cnt = pow(length, chars)
        if cnt > n:
            return chars, n

        chars += 1
        n -= cnt

def conv_base(chars, n, values):
    ret = []
    for i in range(0, chars):
        c = values[n % len(values)]
        ret.append(c)
        n /= len(values)

    return reversed(ret)

values = "ABCDEF"
for i in range(0, 100):
    chars, n = idx_to_length_and_value(i, len(values))
    print "".join(conv_base(chars, n, values))


import math
def get_max_value_for_digits(digits_list):
    max_vals = []

    for val in digits_list:
        val = len(val)
        if max_vals:
            val *= max_vals[-1]
        max_vals.append(val)
    return max_vals

def idx_to_length_and_value(n, digits_list):
    chars = 1
    max_vals = get_max_value_for_digits(digits_list)

    while True:
        if chars-1 >= len(max_vals):
            raise OverflowError, "number not representable"
        max_val = max_vals[chars-1]
        if n < max_val:
            return chars, n

        chars += 1
        n -= max_val

def conv_base(chars, n, digits_list):
    ret = []
    for i in range(chars-1, -1, -1):
        digits = digits_list[i]
        radix = len(digits)

        c = digits[n % len(digits)]
        ret.append(c)
        n /= radix

    return reversed(ret)

digits_list = ["ABCDEF", "ABC", "AB"]
for i in range(0, 120):
    chars, n = idx_to_length_and_value(i, digits_list)
    print "".join(conv_base(chars, n, digits_list))
Glenn Maynard
I think that's it. I'll mull it over in the morning and see if it works for my mixed radix problem too..
bpowah
+2  A: 

First compute the length by summing up powers of six until you exceed your index (or better use the formula for the geometric series).

Subtract the sum of smaller powers from the index.

Compute the representation to base 6, fill leading zeros and map 0 -> A, ..., 5 -> F.

starblue
I'm not sure that the first step works, in a shortened base 2 example you'd get:1=>A, 2=>B, 3=>AA ... 6=>BB, 7=>AAA ... 14=>BBB, 15=>AAAA
dlamblin
Yes, and what's wrong with that?
starblue
Oh. I misread the "summing up" part. So in base 2, if i<=2 the length is 1, and if i<=(2+4) the length is 2, and if i<=(2+4+8) the length is 3 etc. The length part works perfectly and I find the explanation clearer than the code samples.
dlamblin
+1  A: 
alphabet = 'ABCDEF'

def idx_to_excel_column_name(x):
  x += 1 # Make us skip "" as a valid word
  group_size = 1
  for num_letters in itertools.count():
    if x < group_size:
      break
    x -= group_size
    group_size *= len(alphabet)
  letters = []
  for i in range(num_letters):
    x, m = divmod(x, len(alphabet))
    letters.append(alphabet[m])
  return ''.join(reversed(letters))

def excel_column_name_to_idx(name):
  q = len(alphabet)
  x = 0
  for letter in name:
    x *= q
    x += alphabet.index(letter)
  return x+q**len(name)//(q-1)-1
yairchu
I cannot use itertools, but it's a simple matter to replace itertools.count() with while: and num_letters+=1
bpowah
A: 

In perl you'd just convert your input i from base(10) to base(length of "ABCDEF"), then do a tr/012345/ABCDEF/ which is the same as y/0-5/A-F/. Surely Python has a similar feature set.

Oh, as pointed out by Yarichu the combinations are a tad different because if A represented 0, then there would be no combinations with leading A (though he said it a bit different). It seems I thought the problem to be more trivial than it is. You cannot just transliterate different base numbers, because numbers containing the equivalent of 0 would be skipped in the sequence.

So what I suggested is actually only the last step of what starblue suggested, which is essentially what Laurence Gonsalves implemented ftw. Oh, and there is no transliteration (tr// or y//) operation in Python, what a shame.

dlamblin
It's not just a numerical system with different base. For example see how there are 6 one-digit numbers and 36 two-digits numbers. In decimal base there are 9 single-digit numbers and 90 two-digits numbers with the formula for number of n-digit numbers being (base-1)*base^(n-1) unlike base^n in the system described
yairchu
base(10)? Huh?
Laurence Gonsalves
+1  A: 

Since we are converting from a number Base(10) to a number Base(7), whilst avoiding all "0" in the output, we will have to adjust the orginal number, so we do skip by one every time the result would contain a "0".

 1 => A,  or 1  in base [0ABCDEF]
 7 => AA, or 8  in base [0ABCDEF]
13 => BA, or 15 in base [0ABCDEF]
42 => FF, or 48 in base [0ABCDEF]
43 =>AAA, or 50 in base [0ABCDEF]

Here's some Perl code that shows what I'm trying to explain (sorry, didn't see this is a Phython request)

use strict;
use warnings;
my @Symbols=qw/0 A B C D E F/;
my $BaseSize=@Symbols ;
for my $NR ( 1 .. 45) {
   printf ("Convert %3i => %s\n",$NR ,convert($NR));
}

sub convert {
   my ($nr,$res)=@_;
   return $res unless $nr>0;
   $res="" unless defined($res);
   #Adjust to skip '0'
   $nr=$nr + int(($nr-1)/($BaseSize-1));
   return convert(int($nr/$BaseSize),$Symbols[($nr % ($BaseSize))] . $res);
}
lexu
Wait... where did "base 10" come in to this?
Laurence Gonsalves
"normal" numbers are base 10, represented by digits '0' .. '9'. pbowah describes "numbers" with symbols 'A' .. 'F', with the aditional complication, that his system doesn't have a '0'.
lexu
+1  A: 

This works (and is what i finally settled on), and thought it was worth posting because it is tidy. However it is slower than most answers. Can i perform % and / in the same operation?

def f0(x, alph='ABCDE'):
    result = ''
    ct = len(alph)
    while x>=0:
        result += alph[x%ct]
        x /= ct-1
    return result[::-1]
bpowah