views:

67

answers:

7

I'm trying to get the Body Mass Index (BMI) classification for a BMI value that falls within a standard BMI range - for instance, if someone's BMI were 26.2, they'd be in the "Overweight" range.

I made a list of tuples of the values (see below), although of course I'm open to any other data structure. This would be easy to do with SQL's BETWEEN but I'd like to do it in pure Python, mostly because it means one fewer DB connections but also as an exercise in doing more in "pure" Python.

bmi_ranges = []
bmi_ranges.append((u'Underweight', u'Severe Thinness', 0, 15.99))
bmi_ranges.append((u'Underweight', u'Moderate Thinness', 16.00, 16.99))
bmi_ranges.append((u'Underweight', u'Mild Thinness', 17.00, 18.49))
bmi_ranges.append((u'Normal Range', u'Normal Range', 18.50, 24.99))
bmi_ranges.append((u'Overweight', u'Overweight', 25.00, 29.99))
bmi_ranges.append((u'Obese', u'Obese Class I', 30.00, 34.99))
bmi_ranges.append((u'Obese', u'Obese Class II', 35.00, 39.99))
bmi_ranges.append((u'Obese', u'Obese Class III', 40.00, 1000.00))

If a range is exactly in the list of tuples it's easy enough to just iterate through with a listcomp, but how do I find that a value is within the range of any of the other values?

+2  A: 
# bmi = <whatever>
found_bmi_range = [bmi_range for bmi_range
                   in bmi_ranges
                   if bmi_ranges[2] <= bmi <= bmi_ranges[3]
                  ][0]

You can add if clauses to list comprehensions that filter what items are included in the result.

Note: you may want to adjust your range specifications to use a non-inclusive upper bound (i.e. [a,b) + [b,c) + [c,d) et cetera), and then change the conditional to a <= b < c, that way you don't have issues with edge cases.

Amber
And if you really care about performance you may use binary-search tree to reduce number of comparisons. But since since OP have sql-db it would make same thing with proper indexes.
Andrew
bmi = 29.9950000 ?
eumiro
@eumiro - flaw in the original data; could easily be adapted to `bmi_ranges[2] <= bmi < bmi_ranges[3]` if the original data were specified as a `[x,y)` type of range.
Amber
@Amber, the OP is open to any other data structure, so this might be a good hint not to use those .99 limit values. My answer uses only one value to limit the ranges. Your list comprehension would have to be little bit more complicated to take the minValue from the next range.
eumiro
Thanks - yes, my ranges would not allow more decimal places, but BMI standards usually use only 1-2 decimal places anyway so I could round in the assignment of BMI. I would be interested in seeing how this would work with only upper or lower ranges, though (the bisect solution is much, much slower than the list comprehension, @eumiro).
Jough Dempsey
+1  A: 

You can do this with a list comprehension:

>>> result = [r for r in bmi_ranges if r[2] <= 32 <= r[3]]
>>> print result
[(u'Obese', u'Obese Class I', 30.0, 34.99)]

However it would probably be faster to request the database to do this for you as otherwise you are requesting more data than you need. I don't understand how using a BETWEEN requires using one more data connection. If you could expand on that it would be useful. Are you talking about the pros and cons of caching data versus always asking for live data?

You may also want to create a class for your data so that you don't have to refer to fields as x[2], but instead can use more meaningful names. You could also look at namedtuples.

Mark Byers
Probably not faster to do a trip to the database to search through only 8 ranges...
Amber
The roundtrip might be the most expensive part.
Mark Byers
...which is all the more reason to eliminate the roundtrip entirely.
Amber
@Amber: If you're fetching the data from the database anyway you should use BETWEEN, if you're not then you are talking about caching rather than the relative speed of each query. Caching has pros but also cons.
Mark Byers
@Mark: The list of ranges might very well be constant, in which case it's not caching at all, but whether you're talking to a DB or not, period, if the BMI info is coming from the user. (It may not be, but it's a perfectly imaginable scenario.)
Amber
bmi = 29.9950000 ?
eumiro
@Amber: The OP says he is using a database and he says the reason he is not using BETWEEN is because that would require an extra connection.
Mark Byers
@Mark: The OP doesn't actually state what they're using a DB for currently - only that they *could* use a DB.
Amber
@Amber: Oh, I see. Thanks. OK but now assuming that you are right, I don't at all see the purpose of the unnamed tuple instead of using a class...
Mark Byers
@Mark: Probably just the first thing that came to the OP's mind. Dicts, named tuples, or a class would work as well.
Amber
A: 

I'm not sure if I understand why you can't do this just by iterating over the list (I know there are more efficient datastructures, but this is very short and iteration would be more understandable). What's wrong with

def check_bmi(bmi, bmi_range):
    for cls, name, a, b in bmi_range:
        if a <= bmi <= b:
            return cls # or name or whatever you need.
wxs
Er, did you mean `a <= bmi <= b` ?
Amber
bmi = 29.9950000 ?
eumiro
I was iterating, but it seemed like a naive way of getting there and I thought I was closer to the "right" way to do it with the listcomp. This solution would be far less attractive were the dataset larger, but BMI ranges are a standard and there aren't that many values, which is why I wanted to avoid DB overhead to begin with.
Jough Dempsey
Ah right amber. And eumiro, if the bmi is not in one of the given ranges it will return None.
wxs
A: 
zchtodd
bmi = 29.9950000 ?
eumiro
A: 

If you like a lighter original data structure and one import from standard library:

import bisect

bmi_ranges = []
bmi_ranges.append((u'Underweight', u'Severe Thinness', 0, 15.99))
bmi_ranges.append((u'Underweight', u'Moderate Thinness', 16.00, 16.99))
bmi_ranges.append((u'Underweight', u'Mild Thinness', 17.00, 18.49))
bmi_ranges.append((u'Normal Range', u'Normal Range', 18.50, 24.99))
bmi_ranges.append((u'Overweight', u'Overweight', 25.00, 29.99))
bmi_ranges.append((u'Obese', u'Obese Class I', 30.00, 34.99))
bmi_ranges.append((u'Obese', u'Obese Class II', 35.00, 39.99))
bmi_ranges.append((u'Obese', u'Obese Class III', 40.00, 1000.00))

# we take just the minimal value for BMI for each class
# find the limit values between ranges:

limitValues = [line[2] for line in bmi_range][1:]
# limitValues = [16.0, 17.0, 18.5, 25.0, 30.0, 35.0, 40.0]

# bisect.bisect(list, value) returns the range
#in the list, in which value belongs
bmi_range = bmi_ranges[bisect.bisect(limitValues, bmi)]

More information: bisect

eumiro
This seems overly complex (especially compared with the list comprehension solutions above) and less Pythonic, but it's interesting and may be effective with a larger dataset.
Jough Dempsey
A: 

The builtin filter function exists for this purpose:

bmi = 26.2
answer = filter(lambda T, : T[2]<=bmi<=T[3], bmi_ranges)[0]
print answer
>>> (u'Overweight', u'Overweight', 25.0, 29.989999999999998)

Hope this helps

inspectorG4dget
bmi = 29.9950000 ?
eumiro
Using the `if` clause in a list comprehension is the preferred way of doing this now; filter remains available but isn't the preferred method.
Amber
@eumiro: 29.995 will not fall any range, because of the way @JoughDempsey made the range brackets. 29.995 > 29.99
inspectorG4dget
@Amber: Can you please explain why the list comprehension's if statement is preferred to filter?
inspectorG4dget
@inspector: It's considered more Pythonic and easier to read. It can also create a generator instead of a list for lazy evaluation, if so desired.
Amber
A: 

This is how I would deal with it:

import random

bmi_ranges = [(u'Underweight', u'Severe Thinness', 16.0),
               (u'Underweight', u'Moderate Thinness', 17.0),
               (u'Underweight', u'Mild Thinness', 18.5),
               (u'Normal Range', u'Normal Range', 25.0),
               (u'Overweight', u'Overweight', 30.0),
               (u'Obese', u'Obese Class I', 35.0),
               (u'Obese', u'Obese Class II', 40.0),
               (u'Obese', u'Obese Class III', 1000.0)]

def bmi_lookup(bmi_value):
    return next((classification, description, lessthan)
         for classification, description, lessthan in bmi_ranges
         if bmi_value < lessthan)

for bmi in range(20):
    random_bmi = random.random()*50
    print random_bmi, bmi_lookup(random_bmi)
Tony Veijalainen