tags:

views:

385

answers:

4

Hello,

I have a Python list of strings, e.g. initialized as follows:

l = ['aardvark', 'cat', 'dog', 'fish', 'tiger', 'zebra']

I would like to test an input string against this list, and find the "closest string below it" and the "closest string above it", alphabetically and case-insensitively (i.e. no phonetics, just a<b etc). If the input exists in the list, both the "below" and "above" should return the input.

Several examples:

Input  | Below    |  Above   
-------------------------------
bat    | aardvark | cat      
aaa    | None     | aardvark 
ferret | dog      | fish     
dog    | dog      | dog

What's the neatest way to achieve this in Python? (currently I'm iterating over a sorted list using a for loop)

To further clarify: I'm interested in simple dictionary alphabetical comparison, not anything fancy like Levenshtein or phonetics.

Thanks

+2  A: 

You can rephrase the problem to this:

Given a sorted list of strings l and an input string s, find the index in l where s should be inserted so that l remains sorted after insertion.

The elements of l at index-1 and index+1 (if they exist) are the ones you are looking for. In order to find the index, you can use binary search.

Bojan Resnik
A: 

A very naive implementation, good only for short lists: you can pretty easily iterate through the list and compare your choice against each one, then break the first time your choice is 'greater' than the item being compared.

for i, item in enumerate(l):
    if lower(item) > lower(input):
        break

print 'below: %s, above, %s' % (l[i-1], item)
Daniel Roseman
This is what I'm doing right now, editing my answer...
Roee Adler
+12  A: 

This is exactly what the bisect module is for. It will be much faster than just iterating through large lists.

import bisect

def closest(haystack, needle):
    if len(haystack) == 0: return None, None

    index = bisect.bisect_left(haystack, needle)
    if index == 0:
        return None, haystack[0]
    if index == len(haystack):
        return haystack[index], None
    if haystack[index] == needle:
        return haystack[index], haystack[index]        
    return haystack[index-1], haystack[index]

The above code assumes you've sanitized the input and list to be all upper or lower case. Also, I wrote this on my iPhone, so please do check for typos.

Triptych
+1 for the clean solution, but also the name choosing :)
Roee Adler
You need to take care of the case where the list is empty: if index == 0: left = None else: left = haystack[index-1] if index == len(haystack): right = None else: right = haystack[index] return left, right
tonfa
Sorry, I thought it was possible to put code inside comments.
tonfa
+1 This works exactly as described.
hughdbrown
There's a bug in the cases where the needle is larger than all the items in the haystack. It should be return haystack[index-1],None
Daphna Shezaf
A: 

Are these relatively short lists, and do the contents change or are they fairly static?

If you've got a large number of strings, and they're relatively fixed, you might want to look into storing your data in a Trie structure. Once you build it, then it's quick & easy to search through and find your nearest neighbors the way you'd like.

khedron