ansaurus

Question

I need a Python Function that will output a random string of 4 different characters when given the desired probabilites of the characters.

Answer 1

+2 A:

The random class is quite powerful in python. You can generate a list with the characters desired at the appropriate weights and then use random.choice to obtain a selection.

First, make sure you do an import random.

For example, let's say you wanted a truly random string from A,B,C, or D. 1. Generate a list with the characters li = ['A','B','C','D']

Then obtain values from it using random.choice output = "".join([random.choice(li) for i in range(0, n)])

You could easily make that a function with n as a parameter.

In the above case, you have an equal chance of getting A,B,C, or D.

You can use duplicate entries in the list to give characters higher probabilities. So, for example, let's say you wanted a 50% chance of A and 25% chances of B and C respectively. You could have an array like this:

li = ['A','A','B','C']

And so on.

It would not be hard to parameterize the characters coming in with desired weights, to model that I'd use a dictionary.

characterbasis = {'A':25, 'B':25, 'C':25, 'D':25}

Make that the first parameter, and the second being the length of the string and use the above code to generate your string.

David in Dakota 2009-04-13 15:00:50

Answer 2

+2 A:

For four letters, here's something quick off the top of my head:

from random import random

def randABCD(n, pA, pB, pC, pD):
    # assumes pA + pB + pC + pD == 1
    cA = pA
    cB = cA + pB
    cC = cB + pC
    def choose():
        r = random()
        if r < cA:
           return 'A'
        elif r < cB:
           return 'B'
        elif r < cC:
           return 'C'
        else:
           return 'D'
    return ''.join([choose() for i in xrange(n)])

I have no doubt that this can be made much cleaner/shorter, I'm just in a bit of a hurry right now.

The reason I wouldn't be content with David in Dakota's answer of using a list of duplicate characters is that depending on your probabilities, it may not be possible to create a list with duplicates in the right numbers to simulate the probabilities you want. (Well, I guess it might always be possible, but you might wind up needing a huge list - what if your probabilities were 0.11235442079, 0.4072777384, 0.2297927874, 0.25057505341?)

EDIT: here's a much cleaner generic version that works with any number of letters with any weights:

from bisect import bisect
from random import uniform

def rand_string(n, content):
    ''' Creates a string of letters (or substrings) chosen independently
        with specified probabilities. content is a dictionary mapping
        a substring to its "weight" which is proportional to its probability,
        and n is the desired number of elements in the string.

        This does not assume the sum of the weights is 1.'''
    l, cdf = zip(*[(l, w) for l, w in content.iteritems()])
    cdf = list(cdf)
    for i in xrange(1, len(cdf)):
        cdf[i] += cdf[i - 1]
    return ''.join([l[bisect(cdf, uniform(0, cdf[-1]))] for i in xrange(n)])

David Zaslavsky 2009-04-13 15:11:11

Answer 3

A:

Here is a rough idea of what might suit you

import random as r

def distributed_choice(probs):
    r= r.random()
    cum = 0.0

    for pair in probs:
     if (r < cum + pair[1]):
      return pair[0]      
     cum += pair[1]

The parameter probs takes a list of pairs of the form (object, probability). It is assumed that the sum of probabilities is 1 (otherwise, its trivial to normalize).

To use it just execute:

''.join([distributed_choice(probs)]*4)

Il-Bhima 2009-04-13 15:16:33

Answer 4

+4 A:

Here's the code to select a single weighted value. You should be able to take it from here. It uses bisect and random to accomplish the work.

from bisect import bisect
from random import random

def WeightedABCD(*weights):
  chars = 'ABCD'
  breakpoints = [sum(weights[:x+1]) for x in range(4)]
  return chars[bisect(breakpoints, random())]

Call it like this: WeightedABCD(.25, .34, .25, .25).

EDIT: Here is a version that works even if the weights don't add up to 1.0:

from bisect import bisect_left
from random import uniform

def WeightedABCD(*weights):
  chars = 'ABCD'
  breakpoints = [sum(weights[:x+1]) for x in range(4)]
  return chars[bisect_left(breakpoints, uniform(0.0,breakpoints[-1]))]

Pesto 2009-04-13 15:20:34

That'll only work correctly if sum(weights)==1.0, no?

Anthony Towns 2009-04-13 15:29:30

Yes, but I'm assuming that the weights /should/ equal 1.0. I can't perceive of a reason to do otherwise, but you could always normalize them to a 0.0 - 1.0 scale first if you must.

Pesto 2009-04-13 15:37:20

Alternatively, you could change the last line to `return chars[bisect_left(breakpoints, uniform(0.0, breakpoints[-1]))]` and not worry about normalizing. Just remember to change the import statements.

Pesto 2009-04-13 15:46:57

Your example, 0.25,0.34,0.25,0.25, didn't add up to 1.0 :)

Anthony Towns 2009-04-13 17:19:36

I just copied over the original example from the question and didn't pay any attention to it. Still, the second version will handle non-normalized weights.

Pesto 2009-04-13 17:25:44

Answer 5

A:

Hmm, something like:

import random
class RandomDistribution:
    def __init__(self, kv):
        self.entries = kv.keys()
        self.where = []
        cnt = 0
        for x in self.entries:
            self.where.append(cnt)
            cnt += kv[x]
        self.where.append(cnt)   

    def find(self, key):
        l, r = 0, len(self.where)-1
        while l+1 < r:
           m = (l+r)/2
           if self.where[m] <= key:
               l=m
           else:
               r=m
        return self.entries[l]

    def randomselect(self):
        return self.find(random.random()*self.where[-1])

rd = RandomDistribution( {"foo": 5.5, "bar": 3.14, "baz": 2.8 } )
for x in range(1000):
    print rd.randomselect()

should get you most of the way...

Anthony Towns 2009-04-13 15:21:31

Answer 6

A:

Thank you all for your help!!! I was able to figure something out, mostly with this info. For my particular need, i did somthing like this:

import random
#Create a funtion to randomize a given string
def makerandom(seq):
    return ''.join(random.sample(seq, len(seq)))
def randomDNA(n, probA=0.25, probC=0.25, probG=0.25, probT=0.25):
    notrandom=''
    A=int(n*probA)
    C=int(n*probC)
    T=int(n*probT)
    G=int(n*probG)

#The remainder part here is used to make sure all n are used, as one cannot
#have half an A for example.
    remainder=''
    for i in range(0, n-(A+G+C+T)):
        ramainder+=random.choice("ATGC")
    notrandom=notrandom+ 'A'*A+ 'C'*C+ 'G'*G+ 'T'*T + remainder
    return makerandom(notrandom)

BigTree 2009-04-14 12:50:06

I don't like this question but I like the idea of generating random DNA... what was your use for this?Thanks!

Manuel 2010-01-21 05:53:04

ansaurus

tags:

views:

answers:

I need a Python Function that will output a random string of 4 different characters when given the desired probabilites of the characters.

related questions