views:

212

answers:

5

I have a set of points like: pointA(3302.34,9392.32), pointB(34322.32,11102.03), etc.

I need to scale these so each x- and y-coordinate is in the range (0.0 - 1.0). I tried doing this by first finding the largest x value in the data set (maximum_x_value), and the largest y value in the set (minimum_y_value). I then did the following:

pointA.x = (pointA.x - minimum_x_value) / (maximum_x_value - minimum_x_value)
pointA.y = (pointA.y - minimum_y_value) / (maximum_y_value - minimum_y_value)

This changes the relative distances(?), and therefore makes the data useless for my purposes. Is there a way to scale these coordinates while keeping their relative distances the intact?

+2  A: 

If you mean that you are not keeping aspect ratio: just scale to the minimum bounding square instead of minimum bounding rectangle. You should choose the scale factor along both axises to max(dx,dy).

disown
+2  A: 

You have to scale them by the same factor to keep the distances the same.

I'd forget about subtracting the minimum (Note: this part is only true if the points are always positive, which is my usual use case), and just divide by the maximum of the two maxes:

maxval = max(max(A.x), max(A.y)) #or however you find these
A.x = A.x/maxval
A.y = A.y/maxval
Vicki Laidler
This won't work if the input values can be negative.
tzaman
The input values are never negative. I've used the approach you mention as well, but since my points don't go from 0.0 - 20000.0 etc, but more typically 19000.0 - 20000.0, if I don't subtract the minimum value, all the points end up in a corner of 0-1.... I am dividing all the points by the same value though (maximum_value - minimum_value is always the same). If I understand you correctly, that means that they keep their relative distances?
eiaxlid
My approach subtracts the center-point from each range, so you won't have the 'corner' problem. Subtracting the minimum works too, but then the data isn't centered on `(0.5, 0.5)`. Relative distances should be maintained as long as you keep the same scale factor, yes.
tzaman
+6  A: 

You need to scale the x values and the y values by the same amount! I would suggest scaling by the larger of the two ranges (either x or y). In pseudocode, you'd have something like

scale = max(maximum_x_value - minimum_x_value,
            maximum_y_value - minimum_y_value)

Then all the distances between points will be scaled by scale, which is what I presume you're asking for, so if point p_1 was twice as far from point p_2 as from p_3 before rescaling, it will be twice as far after rescaling as well. You should be able to prove this to yourself pretty easily using the Pythagorean theorem.

Pillsy
+1  A: 

Assuming you want your entire data set to be centered on (0.5, 0.5) with a range of (0,1) in both axes, it's easiest to think of the total transformation needed in three steps:

  1. Center the data on the origin:
    P.x -= (maxX - minX) / 2
    P.y -= (maxY - minY) / 2
  2. Scale it down by the same amount in both dimensions, such that the larger of the two ranges becomes (-0.5, 0.5):
    scale = max(maxX - minX, maxY - minY)
    P.x /= scale
    P.y /= scale
  3. Translate the points by (0.5, 0.5) to bring everything where you want it:
    P.x += 0.5
    P.y += 0.5

This approach has the advantage of working perfectly for any given input data, and also filling as much of the unit square as possible while maintaining aspect ratio (and hence relative distances).

tzaman
+1  A: 

Step 1: Re-Locate the origin
Let your new "origin" be (minimum_x_value, minimum_y_value). Shift all your data points by subtracting minimum_x_value from all x-coordinates and by subtracting minimum_y_value from all y-coordinates.

Step 2: Normalize the remaining data
Scale the rest of your data down to fit within the 0.0-1.0 window. Find max_coord as the larger of your maximum x-value or your maximum y-value. Divide all x- and y-coordinates by max_coord.

bta