ansaurus

Question

Python - a clean approach to this problem?

Answer 1

+4 A:

You can read the file into a dictionary (string=>int), then use a list comprehension to get the highest identity code from each sublist.

d = {}
with open("data", 'rb') as data:
  for line in data:
    key, val = line.split(' ')
    d[key] = float(val)

ids = [max(sublist, key=lambda k: d[k]) for sublist in li]

For Python 2.4, use:

ids = []
for sublist in li:
  subnums = map(lambda x: d[x], sublist)
  ids.append(sublist[subnums.index(max(subnums))])

As noted, this is O(n).

Matthew Flaschen 2010-06-02 14:51:34

O(n) sounds good to me.

Justin Peel 2010-06-02 14:59:59

You are not currently casting val to a number, so this code will be comparing *strings*, not floats, i.e. '9.1' > '82.3'. Presumably @Seafoid wants values compared as numbers.

Jeffrey Harris 2010-06-02 17:23:25

Thanks, @Jeffrey.

Matthew Flaschen 2010-06-02 18:02:19

Yeah - I do indeed wish to compare the numerical values. Also, this code throws an error. max() doesn't want to take a keyword argument.

Seafoid 2010-06-03 08:13:04

Just figures that max() has support for keyword arguments from v2.5 onwards. Unfortunately, I have to cope with v2.4.3. Any alternatives to max()?

Seafoid 2010-06-03 08:36:13

This 2.4 version works nicely, it seems! Thanks!

Seafoid 2010-06-03 10:22:16

Answer 2

+2 A:

My solution assumes that you only want the highest number and not the id that is associated with it.

I'd read the identity codes and the numbers in a dictionary as suggested by Matthew

NEW_LIST = []
ID2NUM = {}
with file('codes') as codes:
  for line in codes:
    id, num = line.rstrip().split()
    ID2NUM[id] = num

I added some numbers so every id has a value. My ID2NUM looks like this:

{'abc': 2.9300000000000002,
 'ghi': 3.8700000000000001,
 'ghy': 1.2,
 'hgi': 0.40000000000000002,
 'kop': 4.3499999999999996,
 'lmn': 5.96}

Then process the list li:

for l in li:
  NEW_LIST.append(max([d[x] for x in l]))

>>> NEW_LIST
[5.96, 4.3499999999999996, 1.2]

To write the new list to a file, one number per line:

with file('new_list', 'w') as new_list:
  new_list.write('\n'.join(NEW_LIST))

mmmasterluke 2010-06-02 15:17:40

Sorting is O(n log n), so it's preferable to use max, which is O(n).

Matthew Flaschen 2010-06-02 15:37:35

Good point, thanks.

mmmasterluke 2010-06-02 15:41:15

Answer 3

A:

How about storing each sublist as a binary search tree? You'd get O(log n) search performance on average.

Another option would be to use max-heaps and you'd get O(1) to get the maximum value.

Pin 2010-06-03 20:01:12

ansaurus

tags:

views:

answers:

Python - a clean approach to this problem?

related questions