tags:

views:

198

answers:

2

I've been struggling with the following code. It's an F# implementation of the Forward-Euler algorithm used for modelling stars moving in a gravitational field.

let force (b1:Body) (b2:Body) = 
    let r = (b2.Position - b1.Position)
    let rm = (float32)r.MagnitudeSquared + softeningLengthSquared
    if (b1 = b2) then
        VectorFloat.Zero
    else
        r * (b1.Mass * b2.Mass) / (Math.Sqrt((float)rm) * (float)rm)    

member this.Integrate(dT, (bodies:Body[])) = 

    for i = 0 to bodies.Length - 1 do
        for j = (i + 1) to bodies.Length - 1 do
            let f = force bodies.[i] bodies.[j]
            bodies.[i].Acceleration <- bodies.[i].Acceleration + (f / bodies.[i].Mass)
            bodies.[j].Acceleration <- bodies.[j].Acceleration - (f / bodies.[j].Mass)
        bodies.[i].Position <- bodies.[i].Position + bodies.[i].Velocity * dT
        bodies.[i].Velocity <- bodies.[i].Velocity + bodies.[i].Acceleration * dT   

While this works it isn't exactly "functional". It also suffers from horrible performance, it's 2.5 times slower than the equivalent c# code. bodies is an array of structs of type Body.

The thing I'm struggling with is that force() is an expensive function so usually you calculate it once for each pair and rely on the fact that Fij = -Fji. But this really messes up any loop unfolding etc.

Suggestions gratefully received! No this isn't homework...

Thanks,

Ade

UPDATED: To clarify Body and VectorFloat are defined as C# structs. This is because the program interops between F#/C# and C++/CLI. Eventually I'm going to get the code up on BitBucket but it's a work in progress I have some issues to sort out before I can put it up.

[StructLayout(LayoutKind.Sequential)]
public struct Body
{
    public VectorFloat Position;
    public float Size;
    public uint Color;

    public VectorFloat Velocity;
    public VectorFloat Acceleration;
              '''
}   

[StructLayout(LayoutKind.Sequential)]
public partial struct VectorFloat
{
    public System.Single X { get; set; }
    public System.Single Y { get; set; }
    public System.Single Z { get; set; }
}

The vector defines the sort of operators you'd expect for a standard Vector class. You could probably use the Vector3D class from the .NET framework for this case (I'm actually investigating cutting over to it).

UPDATE 2: Improved code based on the first two replies below:

    for i = 0 to bodies.Length - 1 do
    for j = (i + 1) to bodies.Length - 1 do
        let r = ( bodies.[j].Position -  bodies.[i].Position)
        let rm = (float32)r.MagnitudeSquared + softeningLengthSquared
        let f = r / (Math.Sqrt((float)rm) * (float)rm)    
        bodies.[i].Acceleration <- bodies.[i].Acceleration + (f * bodies.[j].Mass)
        bodies.[j].Acceleration <- bodies.[j].Acceleration - (f * bodies.[i].Mass)
    bodies.[i].Position <- bodies.[i].Position + bodies.[i].Velocity * dT
    bodies.[i].Velocity <- bodies.[i].Velocity + bodies.[i].Acceleration * dT  
  • The branch in the force function to cover the b1 == b2 case is the worst offender. You do't need this if softeningLength is always non-zero, even if it's very small (Epsilon). This optimization was in the C# code but not the F# version (doh!).

  • Math.Pow(x, -1.5) seems to be a lot slower than 1/ (Math.Sqrt(x) * x). Essentially this algorithm is slightly odd in that it's perfromance is dictated by the cost of this one step.

  • Moving the force calculation inline and getting rid of some divides also gives some improvement, but the performance was really being killed by the branching and is dominated by the cost of Sqrt.

WRT using classes over structs: There are cases (CUDA and native C++ implementations of this code and a DX9 renderer) where I need to get the array of bodies into unmanaged code or onto a GPU. In these scenarios being able to memcpy a contiguous block of memory seems like the way to go. Not something I'd get from an array of class Body.

+1  A: 

I'd like to play arround with your code, but it's difficult since the definition of Body and FloatVector is missing and they also seem to be missing from the orginal blog post you point to.

I'd hazard a guess that you could improve your performance and rewrite in a more functional style using F#'s lazy computations: http://msdn.microsoft.com/en-us/library/dd233247(VS.100).aspx

The idea is fairly simple you wrap any expensive computation that could be repeatedly calculated in a lazy ( ... ) expression then you can force the computation as many times as you like and it will only ever be calculated once.

Robert
Thanks Robert. I updated the question with more information although what's above may put me on the right track. Liked your book BTW!
Ade Miller
Cool. Always good to hear people are enjoying the book.
Robert
I've not had a chance to play with this yet but I'm concerned it will impact the memory footprint. The number of bodies is usually quite large of the order 10^4 or 10^5. This is one of the reasons for using a mutable array and modifying in place. Using lazy computations would effectively store an (N^2)/2 array of results I think.
Ade Miller
If you're still interested you can find the source to the F# integrators here http://bitbucket.org/ademiller/nbody/src/
Ade Miller
Thanks, will definitely have a look at this.
Robert
+2  A: 

I'm not sure if it's wise to rewrite this code in a functional style. I've seen some attempts to write pair interaction calculations in a functional manner and each one of them was harder to follow than two nested loops.

Before looking at structs vs. classes (I'm sure someone else has something smart to say about this), maybe you can try optimizing the calculation itself?

You're calculating two acceleration deltas, let's call them dAi and dAj:

dAi = r*m1*m2/(rm*sqrt(rm)) / m1

dAj = r*m1*m2/(rm*sqrt(rm)) / m2

[note: m1 = bodies.[i].mass, m2=bodies.[j].mass]]

The division by mass cancels out like this:

dAi = r*m2 / (rm*sqrt(rm))

dAj = r*m1 / (rm*sqrt(rm))

Now you only have to calculate r/(rm*sqrt(rm)) for each pair (i,j). This can be optimized further, because 1/(rm*sqrt(rm)) = 1/(rm^1.5) = rm^-1.5, so if you let r' = r * (rm ** -1.5), then Edit: no it can't, that's premature optimization talking right there (see comment). Calculating r' = 1.0 / (r * sqrt r) is fastest.

dAi = m2 * r'

dAj = m1 * r'

Your code would then become something like

member this.Integrate(dT, (bodies:Body[])) = 
    for i = 0 to bodies.Length - 1 do
        for j = (i + 1) to bodies.Length - 1 do
            let r = (b2.Position - b1.Position)
            let rm = (float32)r.MagnitudeSquared + softeningLengthSquared            
            let r' = r * (rm ** -1.5)
            bodies.[i].Acceleration <- bodies.[i].Acceleration + r' * bodies.[j].Mass
            bodies.[j].Acceleration <- bodies.[j].Acceleration - r' * bodies.[i].Mass
        bodies.[i].Position <- bodies.[i].Position + bodies.[i].Velocity * dT
        bodies.[i].Velocity <- bodies.[i].Velocity + bodies.[i].Acceleration * dT

Look, ma, no more divisions!

Warning: untested code. Try at your own risk.

cfern
Thanks! I now have an F# integrator which runs at a comparable performance to the C# one. However (rm ** -1.5) represents a significant performance hit. ((float32)(Math.Sqrt((float)rm)) * rm) seems to be faster.
Ade Miller