ansaurus

Question

python: finding a missing letter in the alphabet from a list - least lines of code

Answer 1

A:

Here's one way of doing it, assuming your "alphabets" is integers, and that the list has at least two items:

for i in xrange(1, len(a)):
  if a[i] != a[i - 1] + 1:
    print a[i - 1] + 1, "is missing"

unwind 2009-04-01 08:20:13

Answer 2

A:

With sorted lists a binary search is usually the fastest alghorythm. Could you please provide an example list and an example "missing alphabet"?

Manrico Corazzi 2009-04-01 08:21:18

well, it's just a list of alphabets... like this: ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'] and missing one would be J here.

bLee 2009-04-01 08:32:10

Answer 3

+10 A:

Some questions:

Are all the letters upper or lower case? (a/A)
Is this the only alphabet you'll want to check?
Why are you doing this so often?

Least lines of code:

# do this once, outside the loop
alphabet=set(string.ascii_lowercase)
# inside the loop, just 1 line:
missingletter=(alphabet-set(yourlist)).pop()

The advantage of the above is that you can do it without having to sort the list first. If, however, the list is always sorted, you can use bisection to get there faster. On a simple 26-letter alphabet though, is there much point?

Bisection (done in ~4 lookups):

frompos, topos = 0, len(str)
for i in range(1,100):  #never say forever with bisection...
    trypos = (frompos+topos+1)/2
    print "try:",frompos,trypos,topos
    if alphabet[trypos] != str[trypos]:
        topos = trypos
    else:
        frompos = trypos
    if topos-frompos==1:
        if alphabet[topos] != str[topos]:
            print alphabet[frompos]
        else:
            print alphabet[topos]
        break

This code requires fewer lookups, so is by far the better scaling version O(log n), but may still be slower when executed via a python interpreter because it goes via python ifs instead of set operations written in C.

(Thanks to J.F.Sebastian and Kylotan for their comments)

Phil H 2009-04-01 08:25:40

I like this way, I always forget about set() :P

monkut 2009-04-01 08:33:52

Well.. Why am I doing this so often? Just curious. I was doing it on my way and wanted to know if there is a better way of doing this. I was going to apply this to some other situations. As far as alphabets go, I think your first solution is great :)

bLee 2009-04-01 09:09:58

I think this way would be best, if the alphabet was populated from string.ascii_letters. After all, if they change ASCII some time soon, you don't want your code to break. ;)

Kylotan 2009-04-01 09:10:06

I think there is a bug in your bisection approach when the missing letter is "a". topos will never be 0, so you'll return "b" instead.

Brian 2009-04-01 13:13:27

set(list(a_string)) == set(a_string) There is no need to use `list` here.

J.F. Sebastian 2009-04-01 15:02:49

Brian, you are absolutely right. I was thinking it was all looking a bit too clean for bisection...

Phil H 2009-04-01 15:03:24

J.F.Sebastian, you are also right.

Phil H 2009-04-01 15:04:07

Use string.ascii_lowercase instead of "abc...". It better communicates the code intent

J.F. Sebastian 2009-04-01 15:05:41

It is hard to write binary_search correctly the first time therefore use bisect.bisect or similar

J.F. Sebastian 2009-04-01 15:09:21

JFS, I've changed it to use string.ascii_lowercase, although for the general case you'd just use some source list or set there.

Phil H 2009-04-02 10:26:37

Answer 4

+7 A:

Using a list comprehension:

>>> import string
>>> sourcelist = 'abcdefghijklmnopqrstuvwx'
>>> [letter for letter in string.ascii_lowercase if letter not in sourcelist]
['y', 'z']
>>>

The string module has some predefined constants that are useful.

>>> string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'

>>> string.letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'

>>> string.hexdigits
'0123456789abcdefABCDEF'

>>> string.octdigits
'01234567'

>>> string.digits
'0123456789'

>>> string.printable
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
>>>

monkut 2009-04-01 08:32:13

string.ascii_letters[:26]?? That just gives a list of the alphabet? whoa. +1!

bLee 2009-04-01 09:10:50

yup, string.ascii_letters gives you the whole alphabet both lowercase and caps. :P I just realized string.ascii_lowercase gives you the same result as string.ascii_letters[:26]. Updated.

monkut 2009-04-01 09:18:42

Answer 5

A:

if you're talking about alphabet as letters:

letterSet = set()
for word in wordList:
  letterSet.update(set(word.lower()))

import string
alphabet = set(string.lowercase)
missingLetters = alphabet.difference(letterSet)

vartec 2009-04-01 08:33:25

string.lowercase depends on locale. Use string.ascii_lowercase instead of.

J.F. Sebastian 2009-04-01 15:01:10

Well, at least it should depend on locale (actually it doesn't). Definition of alphabet depends on locale. string.ascii_lowercase is valid *only* for English.

vartec 2009-04-01 15:07:14

It *does* depend on locale e.g. if your system locale is not English: `python -c"import locale; locale.setlocale(locale.LC_ALL,''); import string; print string.lowercase"` then it prints your local alphabet

J.F. Sebastian 2009-04-01 15:31:28

s/local alphabet/lowercases letters in your locale encoding/

J.F. Sebastian 2009-04-01 15:35:57

It should, but in my case it doesn't>>> import locale, string>>> locale.setlocale(locale.LC_ALL,'pl_PL.UTF-8')'pl_PL.UTF-8'>>> string.lowercase'abcdefghijklmnopqrstuvwxyz'>>> locale.setlocale(locale.LC_ALL,'es_ES.UTF-8')'es_ES.UTF-8'>>> string.lowercase'abcdefghijklmnopqrstuvwxyz'

vartec 2009-04-01 15:54:53

Answer 6

+7 A:

In the too clever for it's own good category, and assuming there is exactly one missing letter in a lowercase alphabet:

print chr(2847 - sum(map(ord, theString)))

[Edit] I've run some timings on the various solutions to see which is faster. Mine turned out to be fairly slow in practice (slightly faster if I use itertools.imap instead).

Surprisingly, the listcomp solution by monkut turned out to be fastest - I'd have expected the set solutions to do better, as this must scan the list each time to find the missing letter. I tried first converting the test list to a set in advance of membership checking, expecting this to speed it up but in fact it made it slower. It looks like the constant factor delay in creating the set dwarfs the cost of using an O(n**2) algorithm for such a short string.

That suggested than an even more basic approach, taking advantage of early exiting, could perform even better. The below is what I think currently performs best:

def missing_letter_basic(s):
    for letter in string.ascii_lowercase:
        if letter not in s: return letter
    raise Exception("No missing letter")

The bisection method is probably best when working with larger strings however. It is only just edged out by the listcomp here, and has much better asymptotic complexity, so for strings larger than an alphabet, it will clearly win.

[Edit2]

Actually, cheating a bit, I can get even better than that, abusing the fact that there are only 26 strings to check, behold the ultimate O(1) missing letter finder!

find_missing_letter = dict((string.ascii_lowercase[:i]+string.ascii_lowercase[i+1:],
                            string.ascii_lowercase[i]) for i in range(26)).get

>>> find_missing_letter('abcdefghijklmnoprstuvwxyz')
'q'

Here are my timings (500000 runs, tested with letters missing near the start, middle and end of the string (b, m and y)

                         "b"    "m"     "y"
               bisect : 2.762   2.872   2.922  (Phil H)
             find_gap : 3.388   4.533   5.642  (unwind)
             listcomp : 2.832   2.858   2.822  (monkut)
         listcomp_set : 4.770   4.746   4.700  As above, with sourcelist=set(sourcelist) first
       set_difference : 2.924   2.945   2.880  (Phil H)
                  sum : 3.815   3.806   3.868
             sum_imap : 3.284   3.280   3.260
                basic : 0.544   1.379   2.359 
          dict_lookup : 0.135   0.133   0.134

Brian 2009-04-01 09:06:28

Was your complete set only built once or did you build it on every iteration ?

Christian Witts 2009-04-01 14:10:14

The set of all alphabetic chars (in Phil H's set_difference method) was built once. For listcomp_set, it was every iteration (we need to, as we're checking for membership in the string being tested)

Brian 2009-04-01 14:12:37

+1: for dict approach

J.F. Sebastian 2009-04-01 15:33:49

ansaurus

tags:

views:

answers:

python: finding a missing letter in the alphabet from a list - least lines of code

related questions