I'm completely new to Python and while trying various random bits and pieces I've struck upon a problem that I believe I've "solved", but the code doesn't feel right - I strongly suspect there is going to be a better way to get the desired result.
FYI - I'm using whatever the latest version of Python 3 is, on Windows.
Problem definition
Briefly, what I'm doing is sorting a list of pairs, such that the pairs containing the elements that appears in the fewest pairs are sorted to the front.
The pairs are in the form [i,j]
with 0 <= i <= j < n
, where n
is a known maximum value for the elements. There are no duplicate pairs within the list.
The count of an element i
is a simple count of the number of pairs (not pair elements) in the forms [i,j]
,[j,i]
and [i,i]
where j
is any value that results in a valid pair.
In the sorted result, a pair [i,j]
should appear before a pair [k,l]
if count(i) < count(k)
or count(i) == count(k)
and count(j) < count(l)
(If count(j) == count(l)
the two can be in either order - I'm not bothered about the sort being stable, would be a bonus though).
In the sorted result, a pair [i,j]
should appear before a pair [k,l]
if
min(count(i),count(j)) < min(count(k),count(l))
or
min(count(i),count(j)) == min(count(k),count(l))
and max(count(i),count(j)) < max(count(k),count(l))
.
In otherwords, if the pair is [0,1]
and 1
has a count of one, but 0
has a count of four hundred, the pair should still be at (or at least very near) the front of the list - they need sorting by the least frequent element in the pair.
Here's a contrived example I've built:
input [[0,0],[1,2],[1,4],[2,2],[2,3],[3,3],[3,4]]
Here's the individual element counts and the source pairs they come from:
0: 1 [0,0]
1: 2 [1,2],[1,4]
2: 3 [1,2],[2,2],[2,3]
3: 3 [2,3],[3,3],[3,4]
4: 2 [1,4],[3,4]
And here's the result, along with the pair scores:
output: [[0,0],[1,4],[1,2],[3,4],[2,2],[2,3],[3,3]]
scores: 1 1-2 1-3 2-3 3 3 3
Here, 0
has a count of one (it appears in one pair, albeit twice) so comes first. 1
has a count of two, so appears second - with [1,4]
before [1,2]
because 4
has a count of two and 2
has a count of three, et cetera.
My current solution
As said, I believe this implimentation works accurately, but it just feels that there must be a better way to go about doing this. Anyway, here's what I've got so far:
#my implementation uncommented to reduce post size, see history for comments
def sortPairList( data , n ):
count = []
for i in range(0,n):
count.append( 0 )
#count up the data
for p in data:
count[p[0]] += 1
if p[1] != p[0]:
count[p[1]] += 1
maxcount = 0
for i in range(0,n):
if count[i] > maxcount:
maxcount = count[i]
def elementFrequency(p):
if count[ p[0] ] < count[ p[1] ]:
return count[ p[0] ] + float(count[ p[1] ]) / (maxcount+1)
else:
return count[ p[1] ] + float(count[ p[0] ]) / (maxcount+1)
data.sort( key=elementFrequency )
Any suggestions on a more "Python" way of doing this?
Or anything that's wrong with my current attempt?
New Test Case (see answer's comments)
input: [[0,0],[0,3],[0,5],[0,7],[1,1],[1,2],[1,8],[2,4],[2,5],[3,4],[3,5],[3,9],[4,4],[4,7],[4,8],[6,8],[7,7],[7,9],[8,9]]
expected: [[6,8],[1,1],[1,2],[2,5],[0,5],[1,8],[3,5],[3,9],[7,9],[8,9],[2,4],[0,0],[0,3],[0,7],[7,7],[3,4],[4,7],[4,8],[4,4]]