ansaurus

Question

How can I improve this long-winded Python code?

Answer 1

A:

You could use a list comprehension, as follows:

items = [a for a in items if a[1] != '' and a[2] != '']

Or, if an empty string in any position denotes an aggregate item:

items = [a for a in items if '' not in a]

Of course, you don't necessarily need to assign the reduced list to items - you can use it however you like.

2010-08-16 09:28:07

Neither of your list comprehensions will produce the desired output. See the originally posted pseudocode for the requirements.

Andrew 2010-08-16 09:38:29

-1 If only this worked...

katrielalex 2010-08-16 10:17:07

Sorry - misunderstood the requirements. That'll teach me to try and bang out an answer in a spare couple of minutes.

2010-08-17 07:39:08

Answer 2

+1 A:

list_keys = [ "".join(x[:-1]) for x in items ]
for i in range(len(list_keys)-1):
  if not list_keys[i+1].startswith(list_keys[i]):
     print items[i]
print items[-1]

Here I find the "key" of each item, which is all entries in an item, concatenated, except the last value.

An aggregate item's key is always a prefix of succeeding items' keys, so we can use this test to detect aggregate items and dismiss them.

This alg. prints (on your input):

['Schools', 'Primary schools', '', '16']
['Schools', 'Secondary schools', 'Special ed', '4']
['Schools', 'Secondary schools', 'Non-special ed', '4'],

Note:
This assumes all items are ordered neatly in a tree structure (as your original data). If it's not, it'll be (slightly) more complicated as you'll have to sort the keys before the loop (and keep track of which key belongs to which item).

adamk 2010-08-16 09:29:27

Answer 3

A:

How about making your items objects?

class School (object):
    __init__(self, is_aggregate=false):
        self.is_aggregate = is_aggregate

Ubersoldat 2010-08-16 09:37:38

Answer 4

A:

I'm using a helper function that

makes a set of the entries with more fine-grained data
returns a function suitable for filter that removes aggregate items

Run your example data through this function, and it will give the desired output :)

def remove_aggregates(items):

    def mk_pred(index_i, blank_i, items):
        posts = set(x[index_i] for x in items if x[blank_i] != '')    
        def pred(item):
            return not (item[blank_i] == '' and item[index_i] in posts)
        return pred    

    items = filter(mk_pred(0,1,items), items)
    items = filter(mk_pred(1,2,items), items)
    return items

gnud 2010-08-16 09:46:32

Answer 5

+2 A:

Firstly, I suggest your data structure should look more like this:

items = [
    ['Schools', None, None, 32],
    ['Schools', 'Primary schools', None, 16],
    ['Schools', 'Secondary schools', None, 8],
    ['Schools', 'Secondary schools', 'Special ed', 4],
    ['Schools', 'Secondary schools', 'Non-special ed', 4],
]

We can sort them into a dictionary like this:

result = {}

for item in items:
    if not item[0] in result or not isinstance(result[item[0]], dict): result[item[0]] = {}
    if not item[1] in result[item[0]] or not isinstance(result[item[0]][item[1]], dict): result[item[0]][item[1]] = {}
    if not item[2] in result[item[0]][item[1]] or not isinstance(result[item[0]][item[1]][item[2]], dict): result[item[0]][item[1]][item[2]] = {}

    if not item[0]:
        result = item[3]
    elif not item[1]:
        result[item[0]] = item[3]
    elif not item[2]:
        result[item[0]][item[1]] = item[3]
    else:
        result[item[0]][item[1]][item[2]] = item[3]

And you should end up with a dictionary like:

result = {
    'Schools': {
        'Secondary schools': {
            'Non-special ed': '4',
            'Special ed': '4'
        },
        'Primary schools': '16'
    }
}

My routine could probably be optimized and made recursive.

Also, the numbers total to 24 -- is this an error on your part?

rfw 2010-08-16 09:50:08

Yes, that was an error :((( stupid of me because it ruins the explanation. well spotted, I've fixed.

AP257 2010-08-16 14:27:44

Answer 6

A:

You've asked how to do so elegantly, and further, how to do so better. Your nota bene suggests that the structure you're working with is still malleable. If you want to be able to do so more elegantly, I would suggest changing the way the data is stored. Some options are:

Include an additional field in each list, which indicates whether it is an aggregate value or not:

items = [
    ['Schools', '', '', '32', True],
    ['Schools', 'Primary schools', '', '16', False],
    ['Schools', 'Secondary schools', '', '8', True],
    ['Schools', 'Secondary schools', 'Special ed', '4', False],
    ['Schools', 'Secondary schools', 'Non-special ed', '4', False],
]

Split your data into two lists:

items = [
    [
        ['Schools', '', '', '32'],
        ['Schools', 'Secondary schools', '', '8'],
    ],
    [
        ['Schools', 'Primary schools', '', '16'],
        ['Schools', 'Secondary schools', 'Special ed', '4'],
        ['Schools', 'Secondary schools', 'Non-special ed', '4'],
    ],
]

Make the aggregate values contain a list of their children (although this still wouldn't be much fun to reduce):

items = [
    ['Schools', '', '', '32', [
        ['Schools', 'Primary schools', '', '16', []],
        ['Schools', 'Secondary schools', '', '8', [
            ['Schools', 'Secondary schools', 'Special ed', '4'],
            ['Schools', 'Secondary schools', 'Non-special ed', '4'],
        ],
    ],
]

I would say that the current structure of your data does not allow you to do anything elegant with it. You require logic along the lines of "if this index is blank but it isn't for another entry that has the same value at another of my indexes", and this has to be done twice per list entry, because that logic can occur at two separate pairs of index locations. Fix the way you store your information, and you will be able to write an elegant method of reducing that data.

For example, if you went with the first option I listed (using booleans to indicate whether the entry is aggregate), you could reduce the list with:

reduced = [item for item in items where item[4] == False]

Andrew 2010-08-16 09:55:50

You are right that the data structure is bad, but I don't think your suggestion is that good either. A better way to do this is with nested `dict`s.

katrielalex 2010-08-16 10:14:38

Answer 7

A:

This attemp tries not to be dependend of sorting of the input:

items = [
    ['Schools', '', '', '32'],
    ['Schools', 'Primary schools', '', '16'],
    ['Schools', 'Secondary schools', '', '16'],
    ['Schools', 'Secondary schools', 'Special ed', '8'],
    ['Schools', 'Secondary schools', 'Non-special ed', '8'],
]

def path(item,upto=None):
    return ".".join([p for p in item[:-1] if p][:upto])    

from collections import defaultdict
children_counter = defaultdict(int)
for i in items:
    children_counter[path(i,-1)] += 1

for i in items:
   if children_counter[path(i)] == 0:
        print i

toka 2010-08-16 15:52:33

ansaurus

tags:

views:

answers:

How can I improve this long-winded Python code?

related questions