+2  A: 

If your points are close to each other, you can normal "straight" lines (orthogonal lines). Using the normal smoothing algorithms. You can see the world as being flat.

If they are far apart, you need to compensate for the rounding of the earth, by using great circles to navigate from point to point. Otherwise your straight lines will make a longer way.

It is your choice if a point is too far to create straight lines.

Further you have to know if you need to "visit" each point, or just need to go near, and how near that near is.

If you need to send the course(s) to a plane, ship or other traveller, you probably need to visit each point. If you get the GPS data from an object, you probably just want to plot a course on a screen, and remove the noise.


After seeing your edits: If this is an object moving some traject you want to plot, you might want to smooth the direction and speed instead of the x/y values. (Making your measured values (x) have a fixed and increasing Y-interval makes smoothing a lot easier.)

GvS
+4  A: 
jamesh
The problem is hard, and this solution is not ideal, but it should provide a decent approximation. Remember to do the work parametrically (x against t _and_ y against t, not x against y). Upvote.
dmckee
Bezier fitting works if you know the total ordering of the list of points - but that problem is still unsolved.
Frank Krueger
Bezier will be way off the sampled points; probably more than helpful. Notice that the blue curve misses all the peaks in the gray one? Those misses are quite extreme already in this trivial case and they're not a good way to filter out noise methinks.
Jasper Bekkers
+4  A: 

With an unsorted list, you won't really know which points to include in each segment, so I guess you could just go with the closest point.

One way could be to pick a start point at random, and pick the closest point as the next point in each step. Add the first two points to a set S.

Fit a line to the points in S until the RMS exceeds some value, then clear S and start a new line.

The intersection of consecutive lines would be the end-points of the segments.

jakber
This will work well if the distance between samples is large compared to the scatter, and the curve neither self-crosses, nor comes nearer itself than the sampling scale...
dmckee
I can foresee a few problems with this. In a system with any reasonable amount of noise, a smooth curve would be hardly distinguishable from a noisy section of data if one only uses RMS as a measure of fitness. Also it would be very easy to "miss" some points that are outliers (points that are farther away from their neighbors than any of their neighbors are to each other) until late in the algorithm when you reach the end of the true path and you begin finding that the nearest points are way off.
Clueless
+2  A: 

Here is a heuristic hack that might address the ordering problem for the data, if

  • you have enough points
  • the mean distance between points is small compared to the smallest radius of curvature expected of the path
  • the mean distance between points is not large compared to the std. dev. of the noise
  • the path is not self-crossing (you might get lucky, but no guarantees)

Proceed like this:

  1. Pick (hopefully by a meaningful rather than random means) a starting point, p1.
  2. Find all the points that lie within some clustering distance, r_c of p1. Choose r_c small compared to the expected turning radius, but large compared to the scatter.
  3. Call this cluster C1.
  4. Find point q1 the mean of positions in C1.
  5. Fit a line to the points in C1 and project to (or just beyond) the edge of the cluster, and find the nearest point in your original data. Label that point p2.
  6. Iterate steps 2-5 until you run out of data.

Now you have a new list of points q1..qn that are ordered.

Off the top of my head, very rough, and only works under pretty good conditions...


Self-crossing behavior can probably be improved by requiring in step (5) that the new projected line lie within some maximum angle of the previous one.

dmckee
A: 

It seems that you know the 'golden curve' from your answers to questions, I would suggest finding the Bezier curve of the 'golden curve' as suggested by @jamesh and drawing that.

kenny
+1  A: 

My approach would be to first sort your list of points, then use a bezier curve.

The trick is of course the sorting. Start with one random point and find the nearest point. Assume these two are connected. With those two endpoints, find the nearest points to them. Assume that the one with the smaller distance to it's endpoint is connected to that point. Repeat until all points are connected.

I assume that there are still some problems with this approach, but maybe you can use it as a starting point (pun intended).

Edit: You can do it several times with different starting points, and then see where the results differ. That at least gives you some confidence, which points are connected to each other.

Treb
A: 

How many points you have?
A Bezier curve, as mentioned, is a good idea if you have comparedly few points. If you have many points, buiding clusters as suggested by dmckee.

However you also need another constraint for defining the order of points. There have been many good suggestions for how to chose the points, but unless you introduce another constraint, any gives a possible solution.

Possible constraints I can think of:

  • shortest path
  • most straight segments
  • least total absolute rotation
  • directional preference (i.e. horizontal / vertical is more likely than crisscrossing)

In all cases, to meet the constraint you probably need to test all permutations of the sequence. If you start with a "good guess", you cayn terminate the others quickly.

peterchen
+1  A: 

A completely different approach, that does not require another constraint, but details may depend on your application. It sghould work best if you have a "dense cloud of points" around the path.

Use a "cost" function that defines the difference between the curve and the cloud of points. Use a parametrized curve, and a standard optimization algorithm. - OR - Start with a straight curve from start to end, then use a genetic algorithm to modify it.

The typical cost function would be to take the smallest distance between each point and the curve, and sum the squares.

I have not enough experience to suggest an optimization or genetic algorithm but I am sure it can be done :)

I could imagine a genetic algorithm as follows: The path will be built from Waypoints. Start with putting N waypoints in a straigt line from start to end. (N can be chosen depending on the problem). Mutations could be:

  1. For each segment, if rnd() < x, a new waypoint is introduced in the middle.
  2. For each waypoint, the X and Y coordinate are varied slightly.

You will need to include the total length in the cost function. Splitting might not be needed, or maybe x (the "split chance") might need to decrease as more waypoints are introduced. You may or may not want to apply (2) to the start- and endpoint.

Would be fun to try that...

peterchen
+2  A: 

The problem with the Bezier curve is that is doesn't actually go though the points you have sampled and even though the points samples are distorted a little; the bezier curve might actually be miles off.

A better approximation, and a solution that seems to resemble the original image way better is a Catmull-Rom Spline because it does run though all the points in the curve.

Jasper Bekkers
+1  A: 

I take it that "unsorted list" means that while your set of points is complete, you don't know what order they were travelled through?

The gaussian noise has to be basically ignored. We're given absolutely no information that allows us to make any attempt to reconstruct the original, un-noisy path. So I think the best we can do is assume the points are correct.

At this point, the task consists of "find the best path through a set of points", with "best" left vague. I whipped up some code that attempts to order a set of points in euclidean space, preferring orderings that result in straighter lines. While the metric was easy to implement, I couldn't think of a good way to improve the ordering based on that, so I just randomly swap points looking for a better arrangement.

So, here is some PLT Scheme code that does that.

#lang scheme

(require (only-in srfi/1 iota))

; a bunch of trig
(define (deg->rad d)
  (* pi (/ d 180)))

(define (rad->deg r)
  (* 180 (/ r pi)))

(define (euclidean-length v)
  (sqrt (apply + (map (lambda (x) (expt x 2)) v))))

(define (dot a b)
  (apply + (map * a b)))

(define (angle-ratio a b)
  (/ (dot a b)
     (* (euclidean-length a) (euclidean-length b))))

; given a list of 3 points, calculate the likelihood of the
; angle they represent. straight is better.
(define (probability-triple a b c)
  (let ([av (map - a b)]
        [bv (map - c b)])
    (cos (/ (- pi (abs (acos (angle-ratio av bv)))) 2))))

; makes a random 2d point. uncomment the bit for a 3d point
(define (random-point . x)
  (list (/ (random 1000) 100)
        (/ (random 1000) 100)
        #;(/ (random 1000) 100)))

; calculate the likelihood of an entire list of points
(define (point-order-likelihood lst)
  (if (null? (cdddr lst))
      1
      (* (probability-triple (car lst)
                             (cadr lst)
                             (caddr lst))
         (point-order-likelihood (cdr lst)))))

; just print a list of points
(define (print-points lst)
  (for ([p (in-list lst)])
    (printf "~a~n"
            (string-join (map number->string
                              (map exact->inexact p))
                         " "))))

; attempts to improve upon a list
(define (find-better-arrangement start
                                 ; by default, try only 10 times to find something better
                                 [tries 10]
                                 ; if we find an arrangement that is as good as one where
                                 ; every segment bends by 22.5 degrees (which would be
                                 ; reasonably gentle) then call it good enough. higher
                                 ; cut offs are more demanding.
                                 [cut-off (expt (cos (/ pi 8))
                                                (- (length start) 2))])
  (let ([vec (list->vector start)]
        ; evaluate what we've started with
        [eval (point-order-likelihood start)])
    (let/ec done
      ; if the current list exceeds the cut off, we're done
      (when (> eval cut-off)
        (done start))
      ; otherwise, try no more than 'tries' times...
      (for ([x (in-range tries)])
        ; pick two random points in the list
        (let ([ai (random (vector-length vec))]
              [bi (random (vector-length vec))])
          ; if they're the same...
          (when (= ai bi)
            ; increment the second by 1, wrapping around the list if necessary
            (set! bi (modulo (add1 bi) (vector-length vec))))
          ; take the values from the two positions...
          (let ([a  (vector-ref vec ai)]
                [b  (vector-ref vec bi)])
            ; swap them
            (vector-set! vec bi a)
            (vector-set! vec ai b)
            ; make a list out of the vector
            (let ([new (vector->list vec)])
              ; if it evaluates to better
              (when (> (point-order-likelihood new) eval)
                ; start over with it
                (done (find-better-arrangement new tries cut-off)))))))
      ; we fell out the bottom of the search. just give back what we started with
      start)))

; evaluate, display, and improve a list of points, five times
(define points (map random-point (iota 10)))
(define tmp points)
(printf "~a~n" (point-order-likelihood tmp))
(print-points tmp)
(set! tmp (find-better-arrangement tmp 10))
(printf "~a~n" (point-order-likelihood tmp))
(print-points tmp)
(set! tmp (find-better-arrangement tmp 100))
(printf "~a~n" (point-order-likelihood tmp))
(print-points tmp)
(set! tmp (find-better-arrangement tmp 1000))
(printf "~a~n" (point-order-likelihood tmp))
(print-points tmp)
(set! tmp (find-better-arrangement tmp 10000))
(printf "~a~n" (point-order-likelihood tmp))
(print-points tmp)
Jay Kominek
Thank you for your excellent contribution. I am attempting to fit your code into my framework. I like the approach you took. I do have a bit of excess data (that's why I called it noisy), but I believe that post-processing your approach with some kind of wavelet reduction will give me what I want.
Frank Krueger