tags:

views:

693

answers:

8

I've got a Perl function which takes a timestamp and returns either the unchanged timestamp (if it's never seen it before) or otherwise, it appends some letters to make it unique:

sub uniqify($) {
  my $timestamp = shift;

  state $last_ts = -1;
  state $next_letter = 'A';

  if ($timestamp == $last_ts) {
    $timestamp .= $next_letter++;
  } else {
    $last_ts = $timestamp;
    $next_letter = 'A';
  }

  return $timestamp;
}

So if you call it four times, with the values 1, 1, 1, and 2, it will return 1, then 1A, then 1B, then 2.

Note: It only ever gets called with ever-increasing timestamps, so it doesn't need to recall every one it's ever seen, just the last one.

Now I need to translate this function to Python. I've learned that I can replace the "state" variables with globals (yuck!) or perhaps attach them to the function as attributes, but neither of those is particularly elegant.

Also, Python doesn't have something like Perl's magic autoincrement, where if you "++" a variable whose value is "A", it becomes "B" -- or if it's "Z", it becomes "AA". So that's a curveball too.

I'm hacking together a solution, but it's really ugly and hard to read. Translating from Perl to Python is supposed to have the opposite effect, right? :) So I'm offering this as a challenge to SO users. Can you make it an elegant Python function?

+5  A: 

Well, sorry to say, but you can't just do a direct translation from Perl to Python (including bit-for-bit Perlisms) and expect the outcome to be prettier. It won't be, it will be considerably uglier.

If you want the prettiness of Python you will need to use Python idioms instead.

Now for the question at hand:

from string import uppercase

class Uniquifier(object):

    def __init__(self):
        self.last_timestamp = None
        self.last_suffix = 0

    def uniquify(self, timestamp):
        if timestamp == self.last_timestamp:
            timestamp = '%s%s' % (timestamp,
                                  uppercase[self.last_suffix])
            self.last_suffix += 1
        else:
            self.last_suffix = 0
            self.timestamp = timestamp
        return timestamp

uniquifier = Uniquifier()
uniquifier.uniquify(a_timestamp)

Prettier? Maybe. More readable? Probably.

Edit (re comments): Yes this fails after Z, and I am altogether unhappy with this solution. So I won't fix it, but might offer something better, like using a number instead:

timestamp = '%s%s' % (timestamp,
                      self.last_suffix)

If it were me, I would do this:

import uuid

def uniquify(timestamp):
    return '%s-%s' % (timestamp, uuid.uuid4())

And just be happy.

Ali A
I don't know python; does that string.uppercase[] produce 'A', 'B', etc? If so, what happens after 'Z'?
ysth
Judging by my test of it, it crashes with a string index out of range. Also, be careful to `from string import uppercase` since `import string.uppercase` won't work. And in the else block, `self.last_timestamp` instead of `self.timestamp`. I do lament the SO rush to first post. Fall victim myself:-)
Jarret Hardie
A brief googling says that ascii_uppercase may be a better choice.
ysth
I know it's tradition for Python militants to bash Perl's readability, but seriously, you can't call your Python more or less readable than his Perl! They're both fairly simple syntactically and are both quite easily readable. Lay off on the Pyvangelism, please.
Chris Lutz
Can't you make classes funcallable in Python? That would clean thisup the interface a bit.
jrockway
You can, but it doesn't save you from having to instantiate it. Plus, It's a class. Treating it like a function when it isn't could cause confusion. After all, there is no functional dependence on arguments...
TokenMacGuy
The class is just an implementation detail and it makes sense toabstract that away. (You could implement it like this in Perl, aswell, and get the same ability to reset the function's state easily.)
jrockway
I think he's on to something using UUID rather than reinventing the unique ID wheel.
Schwern
+1  A: 

Quite similar to Ali A, but I'll post mine anyway:

class unique_timestamp:
    suffixes = " ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    def __init__(self):
        self.previous_timestamps = {}
        pass
    def uniquify(self, timestamp):
        times_seen_before = self.previous_timestamps.get(timestamp, 0)
        self.previous_timestamps[timestamp] = times_seen_before + 1
        if times_seen_before > 0:
            return str(timestamp) + self.suffixes[times_seen_before]
        else:
            return str(timestamp)

Usage:

>>> u = unique_timestamp()
>>> u.uniquify(1)
'1'
>>> u.uniquify(1)
'1A'
>>> u.uniquify(1)
'1B'
>>> u.uniquify(2)
'2'
kquinn
+1  A: 

Does the suffix have to be letters like that?

from itertools import count
def unique(timestamp): 
  if timestamp in unique.ts.keys():
    return timestamp + '.' + str(unique.ts[timestamp].next())
  else:
    unique.ts[timestamp] = count()
    return timestamp
unique.ts = {}

You can define a different count if you want the letters back.

This isn't the same as your perl code, though.

  • It keeps a dict around so if you have lots of unique timestamps then you'll use lots of memory.
  • It handles out of order calls, which the original doesn't (i.e. u(1), u(2), u(1)).
Justus
+6  A: 

Look at this answer for a robust method to convert a number to an alphanumeric id

The code I present doesn't go from 'Z' to 'AA', instead goes to 'BA', but I suppose that doesn't matter, it still produces a unique id

from string import uppercase as up
import itertools

def to_base(q, alphabet):
    if q < 0: raise ValueError( "must supply a positive integer" )
    l = len(alphabet)
    converted = []
    while q != 0:
        q, r = divmod(q, l)
        converted.insert(0, alphabet[r])
    return "".join(converted) or alphabet[0]

class TimestampUniqifier( object ):
    def __init__(self):
        self.last = ''
        self.counter = itertools.count()
    def __call__( self, str ):
        if str == self.last:
            suf = self.counter.next()
            return str + to_base( suf, up )
        else:
            self.last = str
            self.counter = itertools.count()
            return str            

timestamp_uniqify = TimestampUniqifier()

usage:

timestamp_uniqify('1')
'1'
timestamp_uniqify('1')
'1A'
timestamp_uniqify('1')
'1B'
timestamp_uniqify('1')
'1C'
timestamp_uniqify('2')
'2'
timestamp_uniqify('3')
'3'
timestamp_uniqify('3')
'3A'
timestamp_uniqify('3')
'3B'

You can call it maaaany times and it will still produce good results:

for i in range(100): print timestamp_uniqify('4')

4
4A
4B
4C
4D
4E
4F
4G
4H
4I
4J
4K
4L
4M
4N
4O
4P
4Q
4R
4S
4T
4U
4V
4W
4X
4Y
4Z
4BA
4BB
4BC
4BD
4BE
4BF
4BG
4BH
4BI
4BJ
4BK
4BL
4BM
4BN
4BO
4BP
4BQ
4BR
4BS
4BT
4BU
4BV
4BW
4BX
4BY
4BZ
4CA
4CB
4CC
4CD
4CE
4CF
4CG
4CH
4CI
4CJ
4CK
4CL
4CM
4CN
4CO
4CP
4CQ
4CR
4CS
4CT
4CU
4CV
4CW
4CX
4CY
4CZ
4DA
4DB
4DC
4DD
4DE
4DF
4DG
4DH
4DI
4DJ
4DK
4DL
4DM
4DN
4DO
4DP
4DQ
4DR
4DS
4DT
4DU
hasen j
I think you need to reset the counter when you get a new timestamp to match the original perl version.
zdan
Very nice solution. Sadly it is marred by your incorrect and dismissive comment about Mike's "little" suffixes. Perl's string autoincrement makes 'Z'++ into 'AA'--a point Mike makes in his post.See http://perldoc.perl.org/perlop.html#Auto-increment-and-Auto-decrement for more info.
daotoad
yea, I found out my mistake later .. hehe
hasen j
+2  A: 

I just tested this up to 1000 against the original perl implementation and diff returns the same results for both. The suffix code is tricky -- this is not a base 36 counter. Hasen J's solution - though it produces a unique timestamp - isn't quite the same since it goes from 'Z' to 'BA', when it should instead go to 'AA' to match the perl ++ operator.

#!/usr/bin/python

class uniqify:
    def __init__(self):
        self.last_timestamp = -1
        self.next_suffix = 'A'
        return

    def suffix(self):
        s = self.next_suffix
        letters = [l for l in self.next_suffix]
        if letters[-1] == 'Z':
            letters.reverse()
            nonz = None
            for i in range(len(letters)):
                if letters[i] != 'Z':
                    nonz = i
                    break
            if nonz is not None:
                letters[nonz] = chr(ord(letters[nonz]) + 1)
                for i in range(0, nonz):
                    letters[i] = 'A'
            else:
                letters = ['A'] * (len(letters) + 1)
            letters.reverse()
        else:
            letters[-1] = chr(ord(letters[-1]) + 1)

        self.next_suffix = ''.join(letters)
        return s

    def reset(self):
        self.next_suffix = 'A'
        return

    def __call__(self, timestamp):
        if timestamp == self.last_timestamp:
            timestamp_str = '%s%s' % (timestamp, self.suffix())
        else:
            self.last_timestamp = timestamp
            self.reset()
            timestamp_str = '%s' % timestamp

        return timestamp_str

uniqify = uniqify()

if __name__ == '__main__':
    for n in range(1000):
        print uniqify(1)
    for n in range(1000):
        print uniqify(2)
bstpierre
for the record, I didn't come up with that method :) also it does a nice job of producing "unique" timestamps, the fact that it's not *exactly* like perl doesn't matter! While your answer produces same output as perl, it's no where near short .. it's actually quite complex
hasen j
+2  A: 

The class is generic and boring, but This is my very first recursive generator. <3

def stamptag():
    yield ''
    prefixtag = stamptag()
    prefix = prefixtag.next()
    while True:
        for i in range(ord('A'),ord('Z')+1):
            yield prefix+chr(i)
        prefix = prefixtag.next()

tagger = stamptag()
for i in range(3000):
    tagger.next()
print tagger.next()

class uniquestamp:
    def __init__(self):
        self.timestamp = -1
        self.tagger = stamptag()

    def format(self,newstamp):
        if self.timestamp < newstamp:
            self.tagger = stamptag()
            self.timestamp = newstamp
        return str(newstamp)+self.tagger.next()

stamper = uniquestamp()
print map(stamper.format, [1,1,1,2,2,3,4,4])

output:

DKJ
['1', '1A', '1B', '2', '2A', '3', '4', '4A']
TokenMacGuy
A: 

This is my first time answering, and I used globals, but it seemed the simplest way to me.

from string import uppercase

last_ts = None
letters = None

def increment(letters):
    if not letters:
        return "A"
    last_letter = letters[-1]
    if last_letter == "Z":
        return increment(letters[:-1])  + "A" 
    return letters[:-1] + uppercase[uppercase.index(last_letter) + 1]

def uniquify(timestamp):
    global last_ts, letters
    if timestamp == last_ts:
        letters = increment(letters)
        return timestamp + letters
    last_ts = timestamp
    letters = None
    return timestamp

print uniquify("1")
print uniquify('1')
print uniquify("1")
print uniquify("2")
for each in range(100): print uniquify("2")


1
1A
1B
2
2A
2B
2C
2D
2E
2F
2G
2H
2I
2J
2K
2L
2M
2N
2O
2P
2Q
2R
2S
2T
2U
2V
2W
2X
2Y
2Z
2AA
2AB
2AC
2AD
2AE
2AF
2AG
2AH
2AI
2AJ
2AK
2AL
2AM
2AN
2AO
2AP
2AQ
2AR
2AS
2AT
2AU
2AV
2AW
2AX
2AY
2AZ
2BA
2BB
2BC
2BD
2BE
2BF
2BG
2BH
2BI
2BJ
2BK
2BL
2BM
2BN
2BO
2BP
2BQ
2BR
2BS
2BT
2BU
2BV
2BW
2BX
2BY
2BZ
2CA
2CB
2CC
2CD
2CE
2CF
2CG
2CH
2CI
2CJ
2CK
2CL
2CM
2CN
2CO
2CP
2CQ
2CR
2CS
2CT
2CU
2CV
A: 

Looking at the problem it seems like a good fit for a coroutine (Python 2.5 or higher). Here's some code that will roughly produce the same result:

def uniqify():
    seen = {}
    val = (yield None)
    while True:
        if val in seen:
            idxa, idxb = seen[val]
            idxb += 1
        else:
            idxa, idxb = (len(seen)+1, ord('a'))
        seen[val] = (idxa, idxb)
        uniq = "%s%s" % (idxa, chr(idxb))
        val = (yield uniq)

And here's how you use it:

>>> u = send.uniqify()
>>> u.next() #need this to start the generator
>>> u.send(1)
'1a'
>>> u.send(1)
'1b'
>>> u.send(1)
'1c'
>>> u.send(2)
'2a'
>>> u.send(2)
'2b'
>>> u.send(1) #you can go back to previous values
'1d'
>>> u.send('stringy') #you can send it anything that can be used as a dict key
'3a'
lost-theory