tags:

views:

229

answers:

4

I have a python script which outputs lots of data, sample is as below. the first of the 4 fields always consists of two letters, one digit, a slash and one or two digits

Gi3/2 --.--.--.-- 0024.e89b.c10e Dell Inc.  
Gi5/4 --.--.--.-- 0030.c1cd.f038 HEWLETTPACKARD   
Gi4/3 --.--.--.-- 0020.ac00.6703 INTERFLEX DATENSYSTEME GMBH  
Gi3/7 --.--.--.-- 0009.4392.34f2 Cisco Systems  
Gi6/6 --.--.--.-- 001c.2333.bd5a Dell Inc  
Gi3/16 --.--.--.-- 0009.7c92.7af2 Cisco Systems  
Gi5/12 --.--.--.-- 0020.ac00.3fb0 INTERFLEX DATENSYSTEME GMBH  
Gi4/5 --.--.--.-- 0009.4392.6db2 Cisco Systems  
Gi4/6 --.--.--.-- 000b.cd39.c7c8 Hewlett Packard  
Gi6/4 --.--.--.-- 0021.70d7.8d33 Dell Inc  
Gi6/14 --.--.--.-- 0009.7c91.fa71 Cisco Systems

What would be the best way to sort this correctly on the first field, so that this sample would read

Gi3/2   --.--.--.-- 0024.e89b.c10e Dell Inc.  
Gi3/7   --.--.--.-- 0009.4392.34f2 Cisco Systems  
Gi3/16  --.--.--.-- 0009.7c92.7af2 Cisco Systems  
Gi4/3   --.--.--.-- 0020.ac00.6703 INTERFLEX DATENSYSTEME GMBH  
Gi4/5   --.--.--.-- 0009.4392.6db2 Cisco Systems  
Gi4/6   --.--.--.-- 000b.cd39.c7c8 Hewlett Packard  
Gi5/4   --.--.--.-- 0030.c1cd.f038 HEWLETT PACKARD  
Gi5/12  --.--.--.-- 0020.ac00.3fb0 INTERFLEX DATENSYSTEME GMBH  
Gi6/14  --.--.--.-- 0009.7c91.fa71 Cisco Systems  
Gi6/4   --.--.--.-- 0021.70d7.8d33 Dell Inc  
Gi6/6   --.--.--.-- 001c.2333.bd5a Dell Inc

My efforts have been very messy, and resulted in numbers such as 12 coming before 5!

As ever, many thanks for your patience.

+4  A: 
def lineKey (line):
    keyStr, rest = line.split(' ', 1)
    a, b = keyStr.split('/', 1)
    return (a, int(b))

sorted(lines, key=lineKey)
yairchu
+1  A: 

You can define a cmp() comparison function, for .sort([cmp[, key[, reverse]]]) calls:

The sort() method takes optional arguments for controlling the comparisons.

cmp specifies a custom comparison function of two arguments (list items) which should return a negative, zero or positive number depending on whether the first argument is considered smaller than, equal to, or larger than the second argument: cmp=lambda x,y: cmp(x.lower(), y.lower()). The default value is None.

In the cmp() function, retrieve the numeric key and use int(field) to ensure numeric (not textual) comparison.

Alternately, a key() function can be defined (thanks, @ Anurag Uniyal):

key specifies a function of one argument that is used to extract a comparison key from each list element: (e.g. key=str.lower). The default value is None.

gimel
key would be better instead of cmp
Anurag Uniyal
+4  A: 

to sort split each line such that you have two tuple, part before / and integer part after that, so each line should be sorted on something like ('Gi6', 12), see example below

s="""Gi3/2 --.--.--.-- 0024.e89b.c10e Dell Inc.  
Gi5/4 --.--.--.-- 0030.c1cd.f038 HEWLETTPACKARD   
Gi4/3 --.--.--.-- 0020.ac00.6703 INTERFLEX DATENSYSTEME GMBH  
Gi3/7 --.--.--.-- 0009.4392.34f2 Cisco Systems  
Gi6/6 --.--.--.-- 001c.2333.bd5a Dell Inc  
Gi3/16 --.--.--.-- 0009.7c92.7af2 Cisco Systems  
Gi5/12 --.--.--.-- 0020.ac00.3fb0 INTERFLEX DATENSYSTEME GMBH  
Gi4/5 --.--.--.-- 0009.4392.6db2 Cisco Systems  
Gi4/6 --.--.--.-- 000b.cd39.c7c8 Hewlett Packard  
Gi6/4 --.--.--.-- 0021.70d7.8d33 Dell Inc  
Gi6/14 --.--.--.-- 0009.7c91.fa71 Cisco Systems"""

lines = s.split("\n")
def sortKey(l):
    a,b = l.split("/")
    b=int(b[:2].strip())
    return (a,b)

lines.sort(key=sortKey)

for l in lines: print l
Anurag Uniyal
A: 

If you are working in a unix environment, you can use "sort" to sort such lists.

Another possibility is to use some kind of bucket sort in your python script, which should be a lot faster.

swegi