A: 

On a log scale, you just plot the log of the numbers linearly (in other words, pretend you're plotting linearly, but take the log of the numbers to be plotted first).

The zero problem can't be solved analytically--you have to pick a minimum order of magnitude for your scale, and no matter what you can't ever reach zero. If you want to plot something at zero, your choices are to arbitrarily give it the minimum order of magnitude of the scale, or to omit it.

Drew Hall
If I'm understanding you correctly, I think that's what I'm doing already. I'm taking the log of the counts and plugging that into the linear equation.I'm not sure you're understanding the special case problem. I'm not trying to find the value at zero, it's that the value at the mincount is 0.
dburke
A: 

I don't have the exact answer, but i think you want to look up Linearizing Exponential Data. Start by calculate the equation of the line passing through the points and take the log of both sides of that equation.

John Ellinwood
+1  A: 

Let's begin with your mapping from the logged count to the size. That's the linear mapping you mentioned:

   size
    |
max |_____
    |   /
    |  /|
    | / |
min |/  |
    |   |
   /|   |
0 /_|___|____
    0   a

where min and max are the min and max sizes, and a=log(maxcount)-b. The line is of y=mx+c where x=log(count)-b

From the graph, we can see that the gradient, m, is (maxsize-minsize)/a.

We need x=0 at y=minsize, so log(mincount)-b=0 -> b=log(mincount)

This leaves us with the following python:

mincount = min(count)
maxcount = max(count)
xoffset = log(mincount)
gradient = (maxsize-minsize)/(log(maxcount)-log(mincount))
for c in count:
    x = log(c)-xoffset
    size = gradient * x + minsize

If you want to make sure that the minimum count is always at least 1, replace the first line with:

mincount = min(count+[1])

which appends 1 to the count list before doing the min. The same goes for making sure the maxcount is always at least 1. Thus your final code per above is:

from math import log
count = [1, 3, 5, 4, 7, 5, 10, 6]
def logdist(count, maxsize=1.75, minsize=.75):
    countdist = []
    mincount = min(count+[1])
    maxcount = max(count+[1])
    xoffset = log(mincount)
    gradient = (maxsize-minsize)/(log(maxcount)-log(mincount))
    for c in count:
        x = log(c)-xoffset
        size = gradient * x + minsize
        countdist.append({'count': c, 'size': round(size, 3)})
    return countdist
Phil H
+1  A: 

Hi,

what you have is that you have tags whose counts are from MIN to MAX; the threshold issue can be ignored here because it amounts to setting every count below threshold to the threshold value and taking the minimum and maximum only afterwards.

You want to map the tag counts to "weights" but in a "logarithmic fashion", which basically means (as I understand it) the following. First, the tags with count MAX get max_weight weight (in your example, 1.75):

weight(MAX) = max_weight

Secondly, the tags with the count MIN get min_weight weight (in your example, 0.75):

weight(MIN) = min_weight

Finally, it holds that when your count decreases by 1, the weight is multiplied with a constant K < 1, which indicates the steepness of the curve:

weight(x) = weight(x + 1) * K

Solving this, we get:

weight(x) = weight_max * (K ^ (MAX - x))

Note that with x = MAX, the exponent is zero and the multiplicand on the right becomes 1.

Now we have the extra requirement that weight(MIN) = min_weight, and we can solve:

weight_min = weight_max * (K ^ (MAX - MIN))

from which we get

K ^ (MAX - MIN) = weight_min / weight_max

and taking logarithm on both sides

(MAX - MIN) ln K = ln weight_min - ln weight_max

i.e.

ln K = (ln weight_min - ln weight_max) / (MAX - MIN)

The right hand side is negative as desired, because K < 1. Then

K = exp((ln weight_min - ln weight_max) / (MAX - MIN))

So now you have the formula to calculate K. After this you just apply for any count x between MIN and MAX:

weight(x) = max_weight * (K ^ (MAX - x))

And you are done.

antti.huima
This is very very close to what I want. The only problem is that the curve is on the wrong side of the linear slope. You assume that K should be less than 1. I would like it to be slightly greater than 1. How can this be accomplished?
dburke
Ah yes, sorry, you are right --- in the last equation, change MAX - x to x - MIN and in the previous one swap ln weight_max and ln weight_min.
antti.huima
When graphed, the curve that your algorithm creates lies below the line between the two points. I've been trying to juggle the numbers around but still can't seem to come up with a way to flip that curve to the other side of the line. I'm guessing that if the function was changed to some form of logarithm instead of an exponent it would do exactly what I'd need. Is that correct?
dburke
Hi, if you just wan't to "flip" the curve around, you don't need to change from exp to log or vice versa. You can do "flipping" by just subtraction and addition. If you want to rotate a function f(x) around by 180 degrees, it becomes C - f(D - x) where D and C are some constants that describe the origin of the rotation; to flip it along Y-axis you do f(D - x) and to flip along X-axis C - f(x).
antti.huima
Yeah, I know, I was just hoping maybe there would be some more elegant solution. The actual problem I'm solving here isn't technically even related to graphs. My question is about finding the most elegant solution to what is apparently a pretty tough mathematical problem.
dburke
By the way, I just wanted to thank you for being so helpful. I hope my inquisitive nature isn't coming across as demanding or anything.
dburke
+1  A: 

Thanks to antti.huima's help, I re-thought out what I was trying to do.

Taking his method of solving the problem, I want an equation where the logarithm of the mincount is equal to the linear equation between the two points.

weight(MIN) = ln(MIN-(MIN-1)) + min_weight
min_weight = ln(1) + min_weight

While this gives me a good starting point, I need to make it pass through the point (MAX, max_weight). It's going to need a constant:

weight(x) = ln(x-(MIN-1))/K + min_weight

Solving for K we get:

K = ln(MAX-(MIN-1))/(max_weight - min_weight)

So, to put this all back into some python code:

from math import log
count = [1, 3, 5, 4, 7, 5, 10, 6]
def logdist(count, threshold=0, maxsize=1.75, minsize=.75):
    countdist = []
    # mincount is either the threshold or the minimum if it's over the threshold
    mincount = threshold<min(count) and min(count) or threshold
    maxcount = max(count)
    constant = log(maxcount - (mincount - 1)) / (maxsize - minsize)
    for c in count:
        size = log(c - (mincount - 1)) / constant + minsize
        countdist.append({'count': c, 'size': round(size, 3)})
    return countdist
dburke