RLS

Background

One application that’s fascinated me ever since I started my software development journey is visualizing complex gravitational interactions in real time. When I began dabbling in three.js roughly five years ago, one of the very first projects I undertook was a 2D three-body simulation. The application was crude, with initial state vectors hard-coded into the program and two dimensions used due to being much easier to visualize than three, but watching the different scenarios play out was endlessly fascinating and motivated me to want to learn more about both simulations and 3D visualizations with WebGL. The fundamentals of gravitational simulations go back to high school physics. We start with Newton’s famous three laws of motion (paraphrased):

An object at rest remains at rest, and an object in motion remains in constant linear motion unless acted upon by an unbalanced force.
$\vec{F} = m\vec{a}$
Every action has an equal and opposite reaction.

As we continue on this educational trajectory, we learn that gravity can be treated like a force that acts at a distance and every object exerts this gravitational force on every other object. Sir Isaac Newton, the luminary thinker he was, also devised an expression to describe this gravitational force between any two bodies with his law of universal gravitation: $$\vec{F} = \frac{G m_1 m_2}{r^2} \hat{r}$$ With these four physical laws in mind, we’re now ready to start turning some abstract theory into practical models of motion. For the duration of this post, all my calculations and discussion will assume a Newtonian model. As Einstein elucidated in the early 20^th century, gravity isn’t really a force in the traditional sense but rather a consequence of the spacetime’s curvature. It also doesn't travel instantaneously, as Newton assumed, but is transmitted at the speed limit of the universe, $c$. Relativistic velocities can change the magnitude of its effect, too. All that said, these additional nuances will be ignored from here on out. For many practical purposes, the Newtonian model works just fine and can create rich, complex simulations while providing a captivating and enlightening visual experience.

An Instant of Gravity

As one begins simulating gravitational interactions, the first thing to make note of is that any motion problem implies an important numerical concept: vectors. Every speed, force, and acceleration, is accompanied by a direction with either 1, 2, or 3 dimensions depending on the sophistication of the model. With two or more objects in a scene, the instantaneous gravitational forces can be calculated from each body to every other body. Each of these forces then points from the center of one body to the center of the other in question. Since this is a motion model, forces are really only a concern insofar as they produce an acceleration, so we can simplify these calculations by dividing the mass of the body we’re evaluating, $m_1$, out of the equation to obtain the instantaneous acceleration from the other body: $$\vec{a} = \frac{G m_2}{d^2} \hat{r}_{1\to2}$$ Where $d$ is the distance between the two bodies and $\hat{r}_{1\to2}$ is the unit vector pointing from the center of body 1 to the center of body 2. Dealing with three dimensions here, the simplest coordinate system to work in will be Cartesian coordinates. We can first find the magnitude of the acceleration, which is everything on the right-hand side of the equation except for the directional unit vector. Substituting the distance formula for $d$, we are left with the following equation for the magnitude of acceleration: $$a = \frac{G m_2}{\Delta x^2+ \Delta y^2+ \Delta z^2}$$ Note that $\Delta x$, $\Delta y$, and $\Delta z$ are shorthand for $(x_2 - x_1)$, etc... Now, in order to to properly convey how this acceleration translates into 3D space, the acceleration must be broken down into its x, y, and z components. This is as simple as taking the ratio of the x, y, or z distance vectors and the total distance, and then multiplying this ratio by the total acceleration. For example: $$a_x = \frac{x_2 - x_1}{d} a$$ The same principle applies to the other two coordinate axes, and notice that the component acceleration vector could have either a positive or negative value, indicating its direction. With the instantaneous acceleration between two of the bodies modeled, this procedure can then be repeated for every other body with respect to body 1 and the results then added as vector sums to obtain the total resultant acceleration on body 1. This procedure can then be generalized for every body in the system according to the following expression: $$\vec{a}_i = \sum_{j = 0, j \neq i}^{n} \frac{G m_j}{d_{ij}^2} \hat{r}_{i \to j}$$ With the total resultant acceleration calculated for each body, we can now translate this acceleration into motion. But how can we have motion in an instant? Motion is a function of distance and time, so finding the instantaneous acceleration, while interesting, isn’t going to solve this puzzle on its own.

Stepping Into Time

The simplest way for me to think about this problem is to go back to Newton’s first law of motion and consider an object traveling in space, undisturbed by any outside influence. The change in position of the particle, $\vec{s}$, over a given period of time, $\Delta t$, is described by the following equation. $$\vec{s} = \vec{v} \Delta t$$ Now, we know that the presence of a gravitational field (i.e. another body) will produce an acceleration that causes the original velocity vector to change. If we assume a constant acceleration over the same time period as before, we are left with the following equation to describe the updated velocity: $$\vec{v}_{i, n + 1} = \vec{v}_{i, n} + \vec{a}_{i, n} \Delta t$$ To avoid confusion, note that the n in this equation refers to the time step and not the number of bodies as in the previous summation. Since, for me at least, it makes more sense for the updated velocity vector to more accurately describe the motion of the body over the given time interval than the velocity vector at the beginning (which would shoot off at a tangent to the actual trajectory), we find the updated position of the body at the end of the time step using the following equation: $$\vec{s}_{i, n + 1} = \vec{s}_{i, n} + \vec{v}_{i, n + 1} \Delta t$$ This process is then repeated at each time step, with $\vec{s}_{i, n + 1}$ becoming the new $ \vec{s}_{i, n}$ at the next time step in the sequence. Without any research into numerical methods, this is the first algorithm I stumbled onto when I created my very first gravity simulation. It just seemed like the most straightforward way to think about the problem. I later learned this method is called semi-implicit Euler or symplectic Euler. While far from the most accurate or sophisticated method, I could have done worse for an educated guess. As opposed to explicit Euler, which would have used $\vec{v}_{i, n}$ to update $\vec{s}_{i, n}$, symplectic Euler conserves energy much better, meaning it can be used over longer timescales and retain a higher level of accuracy.

Achieving a Higher Order

While the symplectic Euler method is much more accurate than explicit Euler over an extended number of time steps or larger step size, it is still a first-order method. Higher-order methods, such as the Runge-Kutta method, can be used to achieve even higher accuracy. Runge-Kutta employs a Taylor series expansion and can theoretically be utilized with any number of orders, though the fourth-order expansion, RK4, is among the most commonly used due to its balance of numerical accuracy and computational cost. After utilizing symplectic Euler for a few early projects in this realm due to its intuitive simplicity, I ended up implementing an RK4 integrator when I began to dig deeper into numerical integration for more accurate and performant simulations. I should note, though, that while RK4 is more accurate than symplectic Euler for short time intervals, the error is not necessarily bounded and can therefore result in worse results over longer durations as the energy in the system changes over time. Nevertheless, it was an enlightening exploration into higher-order numerical methods. The general form of the fourth-order Runge-Kutta method is as follows: Consider the following differential equation of the form $\dot{y} = f(y)$. Given a point, $y_n$, and time step, $h$, the next point $y_{n+1}$ is calculated as follows: $k_1 = f(y_n)$ $k_2 = f(y_n + \frac{h}{2} k_1)$ $k_3 = f(y_n + \frac{h}{2} k_2)$ $k_3 = f(y_n + h k_3)$ $y_{n+1} = y_n + \frac{h}{6} (k_1 + 2k_2 + 2k_3 + k_4)$ It can be difficult to see how this might apply to the gravity simulation problem because, while the differential equation, $\dot{y} = f(y)$, is of a first-order problem, we are ultimately trying to find the updated position from the gravitational acceleration, which is a second-order differential equation. To deal with this, we have to simultaneously solve for both position and velocity as we find the various $k$ terms used in the final formula. For instance: $k_{1,s} = v_n$ $k_{1,v} = a(s_n)$ Then, for the second term: $k_{2,s} = v_n + \frac{h}{2} k_{1,v}$ $k_{2,v} = a(s_n + \frac{h}{2} k_{1,s})$ So, as we solve for velocity, we solve for position, which in turn lets us solve for acceleration, which lets us solve for velocity, etc… until we get to the final parameters, $k_{4,s}$ and $k_{4,v}$, and can then perform the final summation for both the position and velocity at the next time step. As before, this process is then repeated at each subsequent time step. Comparing Euler’s method to Runge-Kutta, it can be difficult to see where the performance increase comes from since Runge-Kutta involves roughly four times the calculations per time step. While Euler involves only updating the velocity with the acceleration at the start of the time step and then using this updated velocity to update the position, we have to do this process three separate times to find the $k_2$, $k_3$, and $k_4$ terms and then do the final summation. With RK4 being a fourth-order method, however, this means that the local truncation error is on the order of $O(h^5)$, whereas the first-order Euler method has a local truncation error of $O(h^2)$. In other words, as the step size gets smaller, the error per time step gets smaller at a much faster rate with RK4 than with Euler. It does get a bit more complex, though, when you consider that symplectic Euler’s global error is bounded, whereas RK4’s global error can grow continuously over time.

CPU Implementation

With the formal mathematical procedure outlined, it’s time to move on to the implementation in software. I will look at implementations on both the CPU and GPU, but for now I will focus on the simpler of the two. The easiest way to accomplish this task is with a simple nested loop. The outer loop references the body to sum accelerations for, and the inner loop references bodies producing acceleration vectors. Rather than looping over every body a second time in the inner loop, too, we can start the inner loop at an index one higher than the index of the outer loop and calculate accelerations for both bodies in the inner loop. This will let us turn an algorithm that otherwise would have had a time complexity of $O(n^2)$ into one that approaches $\frac{1}{2} O(n^2)$, as we only have to loop over each pair once instead of twice. The JavaScript code for this algorithm is shown below. Assume that the acceleration for each body is set to zero for all three components prior to the start of the loop.

// Symplectic Euler
for ( let i = 0; i < bodies.length; i ++ ) {

   for ( let j = 0; j < bodies.length; j ++ ) {

      if ( i !== j ) {

         // Distance formula
         let dX = bodies[j].sInit.x - bodies[i].sInit.x;
         let dY = bodies[j].sInit.y - bodies[i].sInit.y;
         let dZ = bodies[j].sInit.z - bodies[i].sInit.z;
         let d = Math.sqrt( dX**2 + dY**2 + dZ**2 );

         // Newton's law of gravitation
         let a_ij = G * bodies[j].m / d**2;

         bodies[i].a.x += a_ij * dX / d;
         bodies[i].a.y += a_ij * dY / d;
         bodies[i].a.z += a_ij * dZ / d;
      }
   }

   // Update velocity
   bodies[i].vFinal.x = bodies[i].vInit.x + bodies[i].a.x * dt;
   bodies[i].vFinal.y = bodies[i].vInit.y + bodies[i].a.y * dt;
   bodies[i].vFinal.z = bodies[i].vInit.z + bodies[i].a.z * dt;

   // Update position
   bodies[i].sFinal.x = bodies[i].sInit.x + bodies[i].vFinal.x * dt;
   bodies[i].sFinal.y = bodies[i].sInit.y + bodies[i].vFinal.y * dt;
   bodies[i].sFinal.z = bodies[i].sInit.z + bodies[i].vFinal.z * dt;
}

As you can see from the code, the algorithm for symplectic Euler is very straightforward. The code for the fourth-order Runge-Kutta method is a bit more complex, however, due to needing to calculate the intermediate states for each time step. This involves essentially repeating this process four times for $k_1$, $k_2$, $k_3$, and $k_4$ values. Example code is shown in JavaScript below:

// Fourth-order Runge-Kutta
const c_k = [1, 0.5, 0.5, 1];

for ( let k = 0; k < 4; k ++ ) {

   for ( let i = 0; i < bodies.length; i ++ ) {

      for ( let j = i + 1; j < bodies.length; j ++ ) {

         // Distance formula
         let dX_k = bodies[j].sK[k].x - bodies[i].sK[k].x;
         let dY_k = bodies[j].sK[k].y - bodies[i].sK[k].y;
         let dZ_k = bodies[j].sK[k].z - bodies[i].sK[k].z;
         let d_k = Math.sqrt( dX_k**2 + dY_k**2 + dZ_k**2 );

         // Newton's Law of Universal Gravitation
         let a_ij_k = G * bodies[j].m / d_k**2;
         let a_ji_k = G * bodies[i].m / d_k**2;

         // Acceleration of j on i
         bodies[i].kV[k].x += a_ij_k * dX_k / d_k;
         bodies[i].kV[k].y += a_ij_k * dY_k / d_k;
         bodies[i].kV[k].z += a_ij_k * dZ_k / d_k;

         // Acceleration of i on j
         bodies[j].kV[k].x -= a_ji_k * dX_k / d_k;
         bodies[j].kV[k].y -= a_ji_k * dY_k / d_k;
         bodies[j].kV[k].z -= a_ji_k * dZ_k / d_k;
      }

      let v_0 = bodies[i].vInit;
      let s_0 = bodies[i].sInit;

      // Find intermediate k values
      if ( k < 3 ) {

         // Update velocity
         bodies[i].kS[k+1].x = v_0.x + bodies[i].kV[k].x * c_k[k+1]*dt;
         bodies[i].kS[k+1].y = v_0.y + bodies[i].kV[k].y * c_k[k+1]*dt;
         bodies[i].kS[k+1].z = v_0.z + bodies[i].kV[k].z * c_k[k+1]*dt;

         // Update position
         bodies[i].sK[k+1].x = s_0.x + bodies[i].kS[k].x * c_k[k+1]*dt;
         bodies[i].sK[k+1].y = s_0.y + bodies[i].kS[k].y * c_k[k+1]*dt;
         bodies[i].sK[k+1].z = s_0.z + bodies[i].kS[k].z * c_k[k+1]*dt;
      }

      // Perform summation with k values for final state
      else {

         let k1V = bodies[i].kV[0];
         let k2V = bodies[i].kV[1];
         let k3V = bodies[i].kV[2];
         let k4V = bodies[i].kV[3];

         let k1S = bodies[i].kS[0];
         let k2S = bodies[i].kS[1];
         let k3S = bodies[i].kS[2];
         let k4S = bodies[i].kS[3];
            
         bodies[i].vFinal.x = v_0.x + dt*(k1V.x + 2*k2V.x + 2*k3V.x + k4V.x)/6;
         bodies[i].vFinal.y = v_0.y + dt*(k1V.y + 2*k2V.y + 2*k3V.y + k4V.y)/6;
         bodies[i].vFinal.z = v_0.z + dt*(k1V.z + 2*k2V.z + 2*k3V.z + k4V.z)/6;
            
         bodies[i].sFinal.x = s_0.x + dt*(k1S.x + 2*k2S.x + 2*k3S.x + k4S.x)/6;
         bodies[i].sFinal.y = s_0.y + dt*(k1S.y + 2*k2S.y + 2*k3S.y + k4S.y)/6;
         bodies[i].sFinal.z = s_0.z + dt*(k1S.z + 2*k2S.z + 2*k3S.z + k4S.z)/6;
      }
   }
}

Note that the “k” used for the iteration counter in the outer loop is one less than the k used in the formal RK4 definition. It should also be mentioned that the components of each k value for each body are initialized to zero before the start of the outer loop, and $k_{1,s}$ is set to the initial velocity as well. As you can see, the RK4 is a bit more complex with an additional outer loop that iterates four times to capture all of the intermediate values, but its power lies in its accuracy, which, when compared with a lower-order method like explicit Euler (which is really just first-order Runge-Kutta), the error is greatly reduced, which means larger time steps can be used. As stated previously, things get a bit more complicated when comparing to symplectic Euler, however, since the global error is bounded with symplectic methods as compared to explicit methods like RK4.

GPU Implementation

For a relatively small number of bodies, the simplicity of the code and speed of the CPU make it the best choice for implementing these algorithms. With a time complexity on the order of $O(n^2)$, however, parallelizing these computations starts to look more attractive as n gets larger. At some point, the parallel processing power of the GPU starts to outweigh the additional complexity involved in coordinating between the CPU and GPU. Due to this additional complexity, I will avoid giving specific implementation details and talk more in generalities about the process. Perhaps the biggest difference between the way things are done on the CPU and GPU is in the nested loop, which isn’t used in the GPU implementation. Instead, the bodies and all of their parameters such as $\vec{s}_n$, $\vec{v}_n$, $k_{2,s}$, $k_{2,v}$, etc… are stored in a two-dimensional texture. Calling the texture two-dimensional is a little inaccurate, though, since each texel has an additional dimension with four possible values, the R, G, B, and A color channels, which, incidentally, fit nicely with storing vector values for position, velocity, and acceleration. The calculations for each body, then, are done by referencing this texture within a fragment shader, with each fragment corresponding to an individual body that references this system state texture simultaneously to other bodies. While the algorithm still fundamentally has a time complexity on the order of $O(n^2)$, the simultaneity of the calculations massively reduces the time it takes to calculate a frame, especially on powerful hardware. The results of these calculations are stored in a framebuffer, and a technique called ping-pong buffering is used wherein framebuffers are swapped to allow the output from one time step to become the input for the next in a continuous fashion. After the calculation at each time step, these state textures can then be referenced to update positions of visual objects on the screen.

Future Explorations

While parallelizing the calculations on the GPU can result in massive performance gains as the number of bodies increases, the basic algorithm implemented here is still fundamentally limited by its time complexity on the order of $O(n^2)$, which requires that every body calculate acceleration from every other body. More sophisticated methods such as the Barnes-Hut Algorithm and Fast Multipole Method use approximations for groups of distant bodies, effectively “lumping” them together to reduce the number of calculations that need to be performed on each body. For now, I have not ventured into this level of sophistication with any of the simulations I have done. In the future, though, I would like to explore these methods and others to push the number of bodies and accuracy as far as possible. There’s also the matter of general relativity, which would be cool to include for both simulation purposes and to visualize gravitational lensing around massive objects.

Thoughts