views:

168

answers:

4

I have data of this form:

  • for x=1, y is one of {1,4,6,7,9,18,16,19}
  • for x=2, y is one of {1,5,7,4}
  • for x=3, y is one of {2,6,4,8,2}
  • ....
  • for x=100, y is one of {2,7,89,4,5}

Only one of the values in each set is the correct value, the rest is random noise.

I know that the correct values describe a sinusoid function whose parameters are unknown. How can I find the correct combination of values, one from each set? i am looking something like "travelling salesman"combinatorial optimization algorithm

+2  A: 

You're trying to do curve fitting, for which there are several algorithms depending on the type of curve you want to fit your curve to (linear, polynomial, etc.). I have no idea whether there is a specific algorithm for sinusoidal curves (Fourier approximations), but my first idea would be to use a polynomial fitting algorithm with a polynomial approximation of the sine.

I wonder whether you need to do this in the course of another larger program, or whether you are trying to do this task on its own. If so, then you'd be much better off using a statistical package, my preferred one being R. It allows you to import your data and fit curves and draw graphs in just a few lines, and you could also use R in batch-mode to call it from a script or even a program (this is what I tend to do).

Jérémie
curve fitting will find equation of given data but here my problem is to choose data such as it will fit "exactly" in curve. also interpolation will not work for me becoz both interpolation and curve fitting require specific values e.g. here one value of y for each x , they will not choose any one value from the set
Honestly, with so many constraints (exact fit, mostly) and so little information (sinusoidal without its parameters), I don't really see what you could short of exhaustively trying combinations until they perfectly describe a sine curve. But I admit to holding no authority on this domain!
Jérémie
+2  A: 

It depends on what you mean by "exactly", and what you know beforehand. If you know the frequency w, and that the sinusoid is unbiased, you have an equation

a cos(w * x) + b sin(w * x)

with two (x,y) points at different x values you can find a and b, and then check the generated curve against all the other points. Choose the two x values with the smallest number of y observations and try it for all the y's. If there is a bias, i.e. your equation is

a cos(w * x) + b sin(w * x) + c

You need to look at three x values.

If you do not know the frequency, you can try the same technique, unfortunately the solutions may not be unique, there may be more than one w that fits.

Edit As I understand your problem, you have a real y value for each x and a bunch of incorrect ones. You want to find the real values. The best way to do this is to fit curves through a small number of points and check to see if the curve fits some y value in the other sets.

If not all the x values have valid y values then the same technique applies, but you need to look at a much larger set of pairs, triples or quadruples (essentially every pair, triple, or quad of points with different y values)

If your problem is something else, and I suspect it is, please specify it.

  1. Define sinusoid. Most people take that to mean a function of the form a cos(w * x) + b sin(w * x) + c. If you mean something different, specify it.

2 Specify exactly what success looks like. An example with say 10 points instead of 100 would be nice.

It is extremely unclear what this has to do with combinatorial optimization.

deinst
curve fitting will give me sinusoidal equation for ANY DATA i will enter and which will be nearest fit to my entered data. i require combination of data who have some pattern in them(finding parameters of equation is not my goal)and my goal is to find out this values among lots of other values
Once you have the parameters of the equation finding the other points is easy. Finding points without first figuring out the parameters is not going to work.
deinst
thnx ,yes thats something i want but checking every possibility will be exhaustive is there any efficient way to this if not i have to use above method ,its fine if solutions are not unique(but there is very little probability that random noise values will fit exactly in sin curves) i do not know frequency it would have easier if i had
Is there a point for every x? If so you only need to fit a curve for all combinations of the four x's with the smallest number of y values. The uniqueness problem happens because we can increase the frequency without bound, and if your x's are evenly spaced or at integer multiples of some length, there are an infinite number of w values that work. This is essentially the Nyquist theorem.
deinst
A: 

Sinusoidal equations are so general that if you take any random value of all y's these values can be fitted in sinusoidal function unless you give conditions eg. Frequency<100 or all parameters are integers,its not possible to diffrentiate noise and data theorotically so work on finding such conditions from your data source/experiment first.

A: 

By sinusoidal, do you mean a function that is increasing for n steps, then decreasing for n steps, etc.? If so, you you can model your data as a sequence of nodes connected by up-links and down-links. For each node (possible value of y), record the length and end-value of chains of only ascending or descending links (there will be multiple chain per node). Then you scan for consecutive runs of equal length and opposite direction, modulo some initial offset.

xan