ansaurus

Question

Average of two strings in alphabetical/lexicographical order

Answer 1

+4 A:

If you define an alphabet of characters, you can just convert to base 10, do an average, and convert back to base-N where N is the size of the alphabet.

alphabet = 'abcdefghijklmnopqrstuvwxyz'

def enbase(x):
    n = len(alphabet)
    if x < n:
        return alphabet[x]
    return enbase(x/n) + alphabet[x%n]

def debase(x):
    n = len(alphabet)
    result = 0
    for i, c in enumerate(reversed(x)):
        result += alphabet.index(c) * (n**i)
    return result

def average(a, b):
    a = debase(a)
    b = debase(b)
    return enbase((a + b) / 2)

print average('a', 'z') #m
print average('aa', 'zz') #mz
print average('cat', 'doggie') #budeel
print average('google', 'microsoft') #gebmbqkil
print average('microsoft', 'google') #gebmbqkil

Edit: Based on comments and other answers, you might want to handle strings of different lengths by appending the first letter of the alphabet to the shorter word until they're the same length. This will result in the "average" falling between the two inputs in a lexicographical sort. Code changes and new outputs below.

def pad(x, n):
    p = alphabet[0] * (n - len(x)) 
    return '%s%s' % (x, p)

def average(a, b):
    n = max(len(a), len(b))
    a = debase(pad(a, n))
    b = debase(pad(b, n))
    return enbase((a + b) / 2)

print average('a', 'z') #m
print average('aa', 'zz') #mz
print average('aa', 'az') #m (equivalent to ma)
print average('cat', 'doggie') #cumqec
print average('google', 'microsoft') #jlilzyhcw
print average('microsoft', 'google') #jlilzyhcw

FogleBird 2010-03-24 19:30:05

But "budeel" doesn't seem to be between "cat" and "doggie" in alphabetical order though?

Bemmu 2010-03-24 19:31:42

What would "seem" to be between them? Think of it like base 10 math. You can prepend leading 'a' characters here just like you can prepend '0' to any number. Does 'budeel' seem between 'aaacat' and 'doggie'? Yes.

FogleBird 2010-03-24 19:42:24

I think you should do the base 10 math behind the decimal point. So, a->0.01..; aa->0.0101..; z->0.9..; zz->0.99..

Debilski 2010-03-24 19:49:25

You could add `'z'` at the end of string: `pad = lambda s,a=a,b=b: s.ljust(max(len(a), len(b)), alphabet[-1])` Example: `a,b = [debase(pad(s)) for s in (a,b)]`. In this case "cundeb" is between "cat" and "doggie".

J.F. Sebastian 2010-03-24 21:15:25

Or you could add `'a'` instead of `'z'` (replace `alphabet[-1]` by `alphabet[0]` in the above code). In this case "cumqec" is between "cat" and "doggie".

J.F. Sebastian 2010-03-24 21:18:52

To clarify why I thought "budeel" is not between "cat" and "doggie" in alphabetical order: sorted(["budeel", "cat", "doggie"]) = ['budeel', 'cat', 'doggie']

Bemmu 2010-03-25 18:22:15

It was never very clear how you wanted to handle different lengths. Append 'a' (or whatever the first character in your alphabet is) to the shorter word until they're the same length. Then you get 'cumqec' which would fall between 'cat' and 'doggie' in a lexicographical sort.

FogleBird 2010-03-25 19:09:01

Edited my answer based on that new information.

FogleBird 2010-03-25 19:16:51

I started trying to really use this as part of the system today, but I noticed that this gives "m" as being between "aa" and "az", even though sorting them into alphabetical order gives ["aa", "az", "m"]

Bemmu 2010-03-30 07:37:07

Prepend 'a' to the result until its the same length as the inputs.

FogleBird 2010-03-30 12:07:26

Answer 2

+2 A:

Based on your proposed usage, consistent hashing ( http://en.wikipedia.org/wiki/Consistent_hashing ) seems to make more sense.

2010-03-24 19:31:34

Answer 3

+3 A:

It sounds like what you want, is to treat alphabetical characters as a base-26 value between 0 and 1. When you have strings of different length (an example in base 10), say 305 and 4202, your coming out with a midpoint of 3, since you're looking at the characters one at a time. Instead, treat them as a floating point mantissa: 0.305 and 0.4202. From that, it's easy to come up with a midpoint of .3626 (you can round if you'd like).

Do the same with base 26 (a=0...z=25, ba=26, bb=27, etc.) to do the calculations for letters:

cat becomes 'a.cat' and doggie becomes 'a.doggie', doing the math gives cat a decimal value of 0.078004096, doggie a value of 0.136390697, with an average of 0.107197397 which in base 26 is roughly "cumcqo"

Eclipse 2010-03-24 19:49:49

If the strings are of different lengths, then it's best to use base-27, where 0 is ``character does not exist in string'' and 1..26 are letters.

2010-03-24 19:51:08

That issue pretty much goes away by using 'a' as 0, and using fractional math.

Eclipse 2010-03-24 20:04:52

Answer 4

A:

import math
def avg(str1,str2):
    y = ''
    s = 'abcdefghijklmnopqrstuvwxyz'
    for i in range(len(str1)):
        x = s.index(str2[i])+s.index(str1[i])
        x = math.floor(x/2)
        y += s[x]
    return y

print(avg('z','a')) # m
print(avg('aa','az')) # am
print(avg('cat','dog')) # chm

Still working on strings with different lengths... any ideas?

TheMachineCharmer 2010-03-24 20:00:26

Answer 5

+4 A:

If you mean the alphabetically, simply use FogleBird's algorithm but reverse the parameters and the result!

>>> print average('cat'[::-1], 'doggie'[::-1])[::-1]
cumdec

or rewriting average like so

>>> def average(a, b):
...     a = debase(a[::-1])
...     b = debase(b[::-1])
...     return enbase((a + b) / 2)[::-1]
... 
>>> print average('cat', 'doggie')
cumdec
>>> print average('google', 'microsoft') 
jlvymlupj
>>> print average('microsoft', 'google') 
jlvymlupj

gnibbler 2010-03-24 20:00:46

Answer 6

A:

This version thinks 'abc' is a fraction like 0.abc. In this approach space is zero and a valid input/output.

MAX_ITER = 10
letters = " abcdefghijklmnopqrstuvwxyz"
def to_double(name):
    d = 0
    for i, ch in enumerate(name):
        idx = letters.index(ch)
        d += idx * len(letters) ** (-i - 1)
    return d

def from_double(d):
    name = ""
    for i in range(MAX_ITER):
        d *= len(letters)
        name += letters[int(d)]
        d -= int(d)
    return name

def avg(w1, w2):
    w1 = to_double(w1)
    w2 = to_double(w2)
    return from_double((w1 + w2) * 0.5)

print avg('a', 'a') # 'a'
print avg('a', 'aa') # 'a mmmmmmmm'
print avg('aa', 'aa') # 'a zzzzzzzz'
print avg('car', 'duck') # 'cxxemmmmmm'

Unfortunately, the naïve algorithm is not able to detect the periodic 'z's, this would be something like 0.99999 in decimal; therefore 'a zzzzzzzz' is actually 'aa' (the space before the 'z' periodicity must be increased by one.

In order to normalise this, you can use the following function

def remove_z_period(name):
    if len(name) != MAX_ITER:
        return name
    if name[-1] != 'z':
        return name
    n = ""
    overflow = True
    for ch in reversed(name):
        if overflow:
            if ch == 'z':
                ch = ' '
            else:
                ch=letters[(letters.index(ch)+1)]
                overflow = False
        n = ch + n
    return n

print remove_z_period('a zzzzzzzz') # 'aa'

Debilski 2010-03-24 20:28:50

Answer 7

A:

I haven't programmed in python in a while and this seemed interesting enough to try. Bear with my recursive programming. Too many functional languages look like python.

def stravg_half(a, ln):
     # If you have a problem it will probably be in here.
     # The floor of the character's value is 0, but you may want something different
     f = 0
     #f = ord('a')
     L = ln - 1
     if 0 == L:
          return ''
     A = ord(a[0])
     return chr(A/2) + stravg_half( a[1:], L)

def stravg_helper(a, b, ln, x):
    L = ln - 1
    A = ord(a[0])
    B = ord(b[0])
    D = (A + B)/2
    if 0 == L:
        if 0 == x:
             return chr(D)
        # NOTE: The caller of helper makes sure that len(a)>=len(b)
        return chr(D) + stravg_half(a[1:], x)
    return chr(D) + stravg_helper(a[1:], b[1:], L, x)

def stravg(a, b):
    la = len(a)
    lb = len(b)
    if 0 == la:
        if 0 == lb:
            return a # which is empty
        return stravg_half(b, lb)
    if 0 == lb:
        return stravg_half(a, la)
    x = la - lb
    if x > 0:
        return stravg_helper(a, b, lb, x)
    return stravg_helper(b, a, la, -x) # Note the order of the args

nategoose 2010-03-25 01:15:31

Answer 8

A:

Thanks for everyone who answered, but I ended up writing my own solution because the others weren't exactly what I needed. I am trying to average app engine key names, and after studying them a bit more I discovered they actually allow any 7-bit ASCII characters in the names. Additionally I couldn't really rely on the solutions that converted the key names first to floating point, because I suspected floating point accuracy just isn't enough.

To take an average, first you add two numbers together and then divide by two. These are both such simple operations that I decided to just make functions to add and divide base 128 numbers represented as lists. This solution hasn't been used in my system yet so I might still find some bugs in it. Also it could probably be a lot shorter, but this is just something I needed to get done instead of trying to make it perfect.

# Given two lists representing a number with one digit left to decimal point and the
# rest after it, for example 1.555 = [1,5,5,5] and 0.235 = [0,2,3,5], returns a similar
# list representing those two numbers added together.
#
def ladd(a, b, base=128):
        i = max(len(a), len(b))
        lsum = [0] * i  
        while i > 1:
                i -= 1
                av = bv = 0
                if i < len(a): av = a[i]
                if i < len(b): bv = b[i]
                lsum[i] += av + bv
                if lsum[i] >= base:
                        lsum[i] -= base
                        lsum[i-1] += 1
        return lsum

# Given a list of digits after the decimal point, returns a new list of digits
# representing that number divided by two.
#
def ldiv2(vals, base=128):
        vs = vals[:]
        vs.append(0)
        i = len(vs)
        while i > 0:
                i -= 1
                if (vs[i] % 2) == 1:
                        vs[i] -= 1
                        vs[i+1] += base / 2
                vs[i] = vs[i] / 2
        if vs[-1] == 0: vs = vs[0:-1]
        return vs

# Given two app engine key names, returns the key name that comes between them.
#
def average(a_kn, b_kn):
        m = lambda x:ord(x)
        a = [0] + map(m, a_kn)
        b = [0] + map(m, b_kn)
        avg = ldiv2(ladd(a, b))
        return "".join(map(lambda x:chr(x), avg[1:]))

print average('a', 'z') # m@
print average('aa', 'zz') # n-@
print average('aa', 'az') # am@
print average('cat', 'doggie') # d(mstr@
print average('google', 'microsoft') # jlim.,7s:
print average('microsoft', 'google') # jlim.,7s:

Bemmu 2010-03-30 10:07:14

ansaurus

tags:

views:

answers:

Average of two strings in alphabetical/lexicographical order

related questions