Your problem is under-specified.
The issue is that you are trying to optimize two different properties of the resulting data, and these properties may be in opposition to one another. For a given set of data, it may be that the most even distribution has many clusters, and that the smallest number of clusters has a very uneven distribution.
For example, consider: [(a,1),(b,1),(c,1),(d,1),(e,1)], N=2
The most even distribution is [([a],1),([b],1),([c],1),([d],1),([e],1)]
But the smallest number of clusters is [([a,b],2),([c,d],2),([e],1)]
How is an algorithm supposed to know which of these (or which clustering in between them) you want? You need find some way to quantify the tradeoff that you are willing to accept between number of clusters and evenness of distribution.
You can create an example with an arbitrarily large discrepancy between the two possibilities by creating any set with 2k + 1 elements, and assigning them all the value N/2. This will lead to the smallest number of clusters being k+1 clusters (k of 2 elements and 1 of 1) with a weight difference of N/2 between the largest and smallest clusters. And then the most even distribution for this set will be 2k + 1 clusters of 1 element each, with no weight difference.
Edit: Also, "evenness" itself is not a well-defined idea. Are you looking to minimize the largest absolute difference in weights between clusters, or the mean difference in weights, or the median difference in weights, or the standard deviation in weights?