I'd like to combine a few metrics of nodes in a social network graph into a single value for rank ordering the nodes:
in_degree + betweenness_centrality = informal_power_index
The problem is that in_degree
and betweenness_centrality
are measured on different scales, say 0-15 vs 0-35000 and follow a power law distribution (at least definitely not the normal distribution)
Is there a good way to rescale the variables so that one won't dominate the other in determining the informal_power_index
?
Three obvious approaches are:
- Standardizing the variables (subtract
mean
and divide bystddev
). This seems it would squash the distribution too much, hiding the massive difference between a value in the long tail and one near the peak. - Re-scaling variables to the range [0,1] by subtracting
min(variable)
and dividing bymax(variable)
. This seems closer to fixing the problem since it won't change the shape of the distribution, but maybe it won't really address the issue? In particular the means will be different. - Equalize the means by dividing each value by
mean(variable)
. This won't address the difference in scales, but perhaps the mean values are more important for the comparison?
Any other ideas?