views:

202

answers:

7

Hello SO!

I am working on a circular problem. In this problem, we have objects that are put on a ring of size MAX, and are assigned IDs from (0 to MAX-1).

I have three simple functions to test for range inclusions. inRange(i,j,k) tests if i is in the circular interval [j,k[ (Mnemonic is i inRange(j,k)). And I have the same for ranges ]j,k[ and ]j,k].

Code in those three methods look duplicated from one method to another:

def inRange(i,j,k):
    """
    Returns True if i in [j, k[
    * 0 <= i, j, k < MAX
    * no order is assumed between j and k: we can have k < j
    """
    if j <= k:
        return j <= i < k
    # j > k :
    return j <= i or i < k

def inStrictRange(i,j,k):
    """
    Returns True if i in ]j, k[
    * 0 <= i, j, k < MAX
    * no order is assumed between j and k: we can have k < j
    """
    if j <= k:
        return j < i < k
    # j > k :
    return j < i or i < k

def inRange2(i,j,k):
    """
    Returns True if i in ]j, k]
    * 0 <= i, j, k < MAX
    * no order is assumed between j and k: we can have k < j
    """
    if j <= k:
        return j < i <= k
    # j > k :
    return j < i or i <= k

Do you know any cleaner way to implement those three methods? After all, only the operators are changing?!

After thinking of a better solution, I came up with:

from operator import lt, le
def _compare(i,j,k, op1, op2):
    if j <= k:
        return op1(j,i) and op2(i,k)
    return op1(j,i) or op2(i,k)

def inRange(i,j,k):
    return _compare(i,j,k, le, lt)
def inStrictRange(i,j,k):
    return _compare(i,j,k, lt, lt)
def inRange2(i,j,k):
    return _compare(i,j,k, lt, le)

Is it any better? Can you come up with something more intuitive? In short, what would be the Pythonic way to write these three operators?

Also, I hate the inRange, inStrictRange, inRange2 names, but I can't think of crystal-clear names. Any ideas?

Thanks.

+2  A: 

Now I am thinking of something such as:

def comparator(lop, rop):
    def comp(i, j, k):
        if j <= k:
            return lop(j, i) and rop(i,k)
        return lop(j, i) or rop(i,k)

    return comp

from operator import le, lt

inRange = comparator(le, lt)
inStrictRange = comparator(lt, lt)
inRange2 = comparator(lt, le)

Which looks better indeed.

NicDumZ
That's a nice way indeed. I'd recommend the name "comparer" or some such though; "comparator" kind of indicates that the function is a device which compares, rather than just something that compares. Small matter, I know =)
Blixt
Pretty classy indeed. Unfortunately, le and lt don't jump out at you as well as operator literals do.
Nikhil Chelliah
+5  A: 

The Pythonic way to do it is to choose readability, and therefor keep the 3 methods as they were at the beginning.

It's not like they are HUGE methods, or there are thousand of them, or you would have to dynamically generate them.

e-satis
+1: Particularly since the docstrings are most of the text
James Hopkin
maybe. But my question sort of begs for another approach, or at least suggestions :)
NicDumZ
+4  A: 

No higher-order functions, but it's less code, even with the extraneous else.

def exclusive(i, j, k):
    if j <= k:
        return j < i < k
    else:
        return j < i or i < k

def inclusive_left(i, j, k):
    return i==j or exclusive(i, j, k)

def inclusive_right(i, j, k):
    return i==k or exclusive(i, j, k)

I actually tried switching the identifiers to n, a, b, but the code began to look less cohesive. (My point: perfecting this code may not be a productive use of time.)

Nikhil Chelliah
I like this I guess: concentrate on the Mathematical meaning rather than on code factoring. And /yes/*3 for the productivity remark :)
NicDumZ
+7  A: 

Two Zen of Python principles leap to mind:

  • Simple is better than complex.
  • There should be one—and preferably only one—obvious way to do it.

range

The Python built-in function range(start, end) generates a list from start to end.1 The first element of that list is start, and the last element is end - 1.

There is no range_strict function or inclusive_range function. This was very awkward to me when I started in Python. ("I just want a list from a to b inclusive! How hard is that, Guido?") However, the convention used in calling the range function was simple and easy to remember, and the lack of multiple functions made it easy to remember exactly how to generate a range every time.

Recommendation

As you've probably guessed, my recommendation is to only create a function to test whether i is in the range [j, k). In fact, my recommendation is to keep only your existing inRange function.

(Since your question specifically mentions Pythonicity, I would recommend you name the function as in_range to better fit with the Python Style Guide.)

Justification

Why is this a good idea?

  • The single function is easy to understand. It is very easy to learn how to use it.

    Of course, the same could be said for each of your three starting functions. So far so good.

  • There is only one function to learn. There are not three functions with necessarily similar names.

    Given the similar names and behaviours of your three functions, it is somewhat possible that you will, at some point, use the wrong function. This is compounded by the fact that the functions return the same value except for edge cases, which could lead to a hard-to-find off-by-one bug. By only making one function available, you know you will not make such a mistake.

  • The function is easy to edit.

    It is unlikely that you'll need to ever debug or edit such an easy piece of code. However, should you need to do so, you need only edit this one function. With your original three functions, you have to make the same edit in three places. With your revised code in your self-answer, the code is made slightly less intuitive by the operator obfuscation.

  • The "size" of the range is obvious.

    For a given ring where you would use inRange(i, j, k), it is obvious how many elements would be covered by the range [j, k). Here it is in code.

    if j <= k:
        size = k - j
    if j > k:
        size = k - j + MAX
    

    So therefore

    size = (k - j) % MAX
    

Caveats

I'm approaching this problem from a completely generic point of view, such as that of a person writing a function for a publicly-released library. Since I don't know your problem domain, I can't say whether this is a practical solution.

Using this solution may mean a fair bit of refactoring of the code that calls these functions. Look through this code to see if editing it is prohibitively difficult or tedious.


1: Actually, it is range([start], end, [step]). I trust you get what I mean though.

Wesley
Wow. Thanks for taking the time to write that answer. But afaik, range() generates a list. And I don't really want to generate a list for each generic comparison: we are talking of generating a list of size O(n), and doing a O(n) operation (_contains__) instead of my three arithmetic operations. It's a very important overhead :)
NicDumZ
I don't mean to say that you should generate a range for your comparison. I'm saying that you should consider why there is only one `range` function, and then consider only writing one `inRange` function for the same reasons. Your `inRange` function is obviously O(1). However, you should consider omitting the other two functions entirely. If you suddenly need an inclusive `inRange` for *x* between *a* and *b*, you have `inRange(x, a, b+1)`, which is also O(1).
Wesley
Right, sorry for reading too fast through your answer then. I certainly agree with your suggestion then. in_range2(x, lo, hi) can be obtained through `not in_range(x, hi, lo)`
NicDumZ
For clarity's sake, I would rewrite a call to `in_range2(x, lo, hi)` as `in_range(x, lo + 1, hi + 1)`. After all, the goal of being pythonic is being easy to understand. Admittedly, adding one to the bounds is not terribly clear either, but it's a little easier to work out with time and a pencil.
Wesley
If you don't want to generate a list, use xrange() instead. It uses yield http://stackoverflow.com/questions/231767/can-somebody-explain-me-the-python-yield-statement/231855#231855) and generate the values on the fly.
e-satis
Just to be clear, I'll respond again: the solution I've recommended does not use the `range` function. I merely point to the `range` function as a model for designing the `inRange` function.
Wesley
+1  A: 

To make it more familiar to your users, I would have one main in_range function with the same bounds as range(). This makes it much easier to remember, and has other nice properties as Wesley mentioned.

def in_range(i, j, k):
    return (j <= i < k) if j <= k else (j <= i or i < k)

You can certainly use this one alone for all your use cases by adding 1 to j and/or k. If you find that you're using a specific form frequently, then you can define it in terms of the main one:

def exclusive(i, j, k):
    """Excludes both endpoints."""
    return in_range(i, j + 1, k)

def inclusive(i, j, k):
    """Includes both endpoints."""
    return in_range(i, j, k + 1)

def weird(i, j, k):
    """Excludes the left endpoint but includes the right endpoint."""
    return in_range(i, j + 1, k + 1)

This is shorter than mucking around with operators, and is also much less confusing to understand. Also, note that you should use underscores instead of camelCase for function names in Python.

Kiv
+1 for the CamelCase remark, and for reminding me that for naturals "i in [lo, hi[" <=> 'i in [lo, hi+1]". It's however a bit unperfect: you need to include MAX modulos, as my ranges are cyclic :)
NicDumZ
+1  A: 

I'd go one step further than Wesley in aping the normal python 'in range' idiom; i'd write a cyclic_range class:

import itertools

MAX = 10 # or whatever

class cyclic_range(object):
 def __init__(self, start, stop):
  # mod so you can be a bit sloppy with indices, plus -1 means the last element, as with list indices
  self.start = start % MAX
  self.stop = stop % MAX
 def __len__(self):
  return (self.stop - self.start) % MAX
 def __getitem__(self, i):
  return (self.start + i) % MAX
 def __contains__(self, x):
  if (self.start < self.stop):
   return (x >= self.start) and (x < self.stop)
  else:
   return (x >= self.start) or (x < self.stop)
 def __iter__(self):
  for i in xrange(len(self)):
   yield self[i]
 def __eq__(self, other):
  if (len(self) != len(other)): return False
  for a, b in itertools.izip(self, other):
   if (a != b): return False
  return True
 def __hash__(self):
  return (self.start << 1) + self.stop
 def __str__(self):
  return str(list(self))
 def __repr__(self):
  return "cyclic_range(" + str(self.start) + ", " + str(self.stop) + ")"
 # and whatever other list-like methods you fancy

You can then write code like:

if (myIndex in cyclic_range(firstNode, stopNode)):
 blah

To do the equivalent of inRange. To do inStrictRange, write:

if (myIndex in cyclic_range(firstNode + 1, stopNode)):

And to do inRange2:

if (myIndex in cyclic_range(firstNode + 1, stopNode + 1)):

If you don't like doing the additions by hand, how about adding these methods:

 def strict(self):
  return cyclic_range(self.start + 1, self.stop)
 def right_closed(self):
  return cyclic_range(self.start + 1, self.stop + 1)

And then doing:

if (myIndex in cyclic_range(firstNode, stopNode).strict()): # inStrictRange
if (myIndex in cyclic_range(firstNode, stopNode).closed_right()): # inRange2

Whilst this approach is, IMHO, more readable, it does involve doing an allocation, rather than just a function call, which is more expensive - although still O(1). But then if you really cared about performance, you wouldn't be using python!

Tom Anderson
On a philosophic aspect, I disagree with "if you really cared about performance, you wouldn't be using python!". I like Python for its readability. It's great to prototype implementations and improve algorithms (algorithms, not implementations). Then, if you need something fast, porting it to another language is not so hard.
NicDumZ
Interesting approach too, thanks :) A small remark: __hash__ is fast, but very collusive: ~70% of collisions. That might be unwanted :)
NicDumZ
+2  A: 

I certainly agree that you need only one function, and that the function should use a (Pythonic) half-open range.

Two suggestions:

  1. Use meaningful names for the args: in_range(x, lo, hi) is a big improvement relative to the 2-keystroke cost.

  2. Document the fact that the constraint hi < MAX means that it is not possible to express a range that includes all MAX elements. As Wesley remarked, size = (k - j) % MAX i.e. size = (hi - lo) % MAX and thus 0 <= size < MAX.

John Machin
Two very useful points, thanks. For most of the answers, there is also a problem with a range of size 0. But fortunately enough, this is not in my scope today :)
NicDumZ