Every time I see anything on the N-Body problem I am reminded by my final high school project... I had 2-3 weeks of time to write an n-body simulation. Back then I used C++ and my hardware was really bad (2 GHz single core CPU or so…). The result was not really impressive because it did not really work. :D But I was able to show that my code correctly predicted that the moon and the earth would eventually collide without any initial velocity given to both bodies. I went into this totally naive and blind but it was a ton of fun.
> my hardware was really bad (2 GHz single core CPU or so…)
laughs in 32kb of RAM
Sounds like a great project, though. There's a lot of fundamental concepts involved in something like this, so it's a great learning exercise (and fun as well!) :)
Once you have the matrix implementation in Step 2 (Implementation 3) it's rather straightforward to extend your N-body simulator to run on a GPU with Jax --- you can just add `import jax.numpy as jnp` and replace all the `np.`s with `jnp`s.
For a few-body system (e.g., the Solar System) this probably won't provide any speedup. But once you get to ~100 bodies you should start to see substantial speedups by running the simulator on a GPU.
Oh for sure there is a ton that is going on behind the scenes when you substitute `np` --> `jnp`. But as someone who worked on gravitational dynamics a decade ago, it's really incredible that so much work has been done to make running numerical calculations on a GPU so straightforward for the programmer.
Obviously if you want maximum performance you'll probably still have to roll up your sleeves and write CUDA yourself, but there are a lot of situations where you can still get most of the benefit of using a GPU and never have to worry yourself over how to write CUDA.
People think high-level libraries are magic fairy dust - "abstractions! abstractions! abstractions!". But there's no such thing as an abstraction in real life, only heuristics.
Not true for regular GPUs like RTX5090. They have atrocious float64 performance compared to CPU. You need a special GPU designed for scientific computations (many float64 cores)
This is confusing: GPUs seem great for scientific computations. You often want f64 for scientific computation. GPUs aren't good with f64.
I'm trying to evaluate if I can get away with f32 for GPU use for my molecular docking software. Might be OK, but I've hit cases broadly where f64 is fine, but f32 is not.
I suppose this is because the dominant uses games and AI/ML use f32, or for AI even less‽
You need cards like the A100 / H100 / H200 / B200. GPUs aren’t fundamentally worse at f64. Nvidia just makes it worse in many cards for market segmentation.
Next logical optimization: Barnes Hut? Groups source bodies using a recursive tree of cubes. Gives huge speedups with high body counts. FMM is a step after, which also groups target bodies. Much more complicated to implement.
That's mentioned on the "Conclusions" page of TFA:
> Large-scale simulation: So far, we have only focused on systems with a few objects. What about large-scale systems with thousands or millions of objects? Turns out it is not so easy because the computation of gravity scales as
. Have a look at Barnes-Hut algorithm to see how to speed up the simulation. In fact, we have documentations about it on this website as well. You may try to implement it in some low-level language like C or C++.
C or C++? Ha! I've implemented FMM in Fortran 77. (Not my choice; it was a summer internship and the "boss" wanted it that way.) It was a little painful.
Oh wow! I implemented BH in rust not long ago. Was straightforward. Set it up with grpahics so I could see the cubes etc. Then looked into FMM... I couldn't figure out where to start! Looked very formidable. multipole seems to be coming up in everything scientific I look at these days...
Supercomputers will simulate trillions of masses. The HACC code, commonly used to verify the performance of these machines, uses a uniform grid (interpolation and a 3D FFT) and local corrections to compute the motion of ~8 trillion bodies.
In Step 2 (Gravity), why are we summing over the cube of the distance between the bodies in the denominator?
Edit: To answer myself, I think this is because one of the factors is to normalize the vector between the two bodies to length 1, and the other two factors are the standard inverse square relationship.
Oops, pretty big mistake on my part. I gave the formula for force and labeled it acceleration. Way too late to edit, but the correctly labeled formulas should have been:
F = G * m1 * m2 / |r|^2 * r_unit
F = G * m1 * m2 / |r|^3 * r
Or as acceleration, by dividing out an `m` using `F=ma`:
The force itself is `G * m1 * m2 / (r^2)`. That's a pure magnitude. The direction of the force is just the unit vector going from m1 to m2. You need it to be a unit vector or else you're multiplying up to something higher than that force. However, I don't get why you'd ever cube the 'r'. Never seen that. I don't think it's right, tbh.
Ok, I see what you're doing. Your multiplying the force vector by a non unit-vector, and then dividing back out the linear amount to correct for it. You never see this in a physics book because it's a computational hack, probably because it saves you the CPU cost of not having to do the 3 division operations it takes to get each component (X,Y,Z) of the unit vector.
This makes sense to do in computer code also because if you were going to raise r_mag to a power, you might as well raise it to 3 instead of 2, because it's not extra cost, but you do avoid the three divisions, by never calculating a unit vector. Back when I was doing this work, was decades ago and I had no idea about cost of floating points. Thanks for explaining!
Also fun is that taking the magnitude involves a square root that can sometimes be avoided, but that doesn't really help us here because of the power of three. If the denominator were squared we could just use `r_mag^2 = r_x^2 + r_y^2`, but we still need the root to get the direction. It is kinda interesting though that in 2d it expands to a power of `3/2`:
Yeah, on paper (or mathematical symbolics) it comes down to what's more clear and representing reality. That's why I initially said I know there's no cubic relations in the physics of this, which was correct.
But that doesn't mean that therefore there's no correct physics equations (for gravity) involving the cube of a distance, even when there's only squares in these "laws" of physics.
In both cases the power of 2, as well as 3/2, is there merely to "cancel out" the fact that you didn't use a unit vector (in the numerator) and therefore need to divide that out in the denominator, to end up scaling the force magnitude against a unit vector.
My favorite thing about this kind of code is that people are constantly inventing new techniques to do time integration. It's the sort of thing you'd think was a solved problem but then when you read about time integration strategies you realize how rich the space of solutions are. And they are more like engineering problems than pure math, with tradeoffs and better fits based on the kind of system.
Decades ago ... I think it was Computer Recreations column in "Scientific American" ... the strategy, since computers were less-abled then, was to advance all the bodies by some amount of time — call it a "tick" — when bodies got close, the "tick" got smaller: therefore the calculations more nuanced, precise. Further apart and you could run the solar system on generalities.
One way of doing that is by each time step delta perform two integrations, one which give you x_1, and another x_2, with different accuracies. The error is estimated by the difference |x_1 - x_2| and you make the error match your tolerance by adjusting the time step.
Naturally, this difference becomes big when two objects are close together, since the acceleration will induce a large change of velocity, and a lower time step is needed to keep the error under control.
The suggested method of trying two time steps and comparing the accelerations does about twice as many calculations per simulation step, which don't require twice the time thanks to coherent memory access: a reasonable price to pay to ensure that every time step is small enough.
Optimistic fixed time steps are just going to work well almost always and almost everywhere, accumulating errors behind your back at every episode of close approach.
Yes, the author uses a globally-adaptive time stepper, which is only efficient for very small N. There are adaptive time step methods that are local, and those are used for large systems.
If you see bodies flung out after close passes, three solutions are available: reduce the time step, use a higher order time integrator, and (the most common method) add regularization. Regularization (often called "softening") removes the singularity by adding a constant to the squared distance. So 1 over zero becomes one over a small-ish and finite number.
>Regularization (often called "softening") removes the singularity by adding a constant to the squared distance. So 1 over zero becomes one over a small-ish and finite number.
IIRC that is what I did in the end. It is fudge, but it works.
You need to invent calculus to get partial differential equations. You need invent PDEs to get approximate solutions to PDEs. You need to invent approximate solutions to PDEs to get computational methods. You need to invent computational methods to explore time integration techniques.
The reason uncertainty goes up the closer we try to observe something in physics, is because everything is ultimately waves of specific wavelengths. So if you try to "zoom in" on one of the 'humps' of a sine wave, for example you don't see anything but a line that gets straighter and straighter, which is basically a loss of information. This is what Heisenberg Principle is about. Not the "Say my Name" one, the other one. hahaha.
And yes that's a dramatic over-simplification of uncertainty principle, but it conveys the concept perfectly.
Back in the early 1990s I was going through books in the college library and found one on numerical results of different 3-body starting configurations.
I vividly remember the Pythagorean three-body problem example, and how it required special treatment for the close interactions.
Which made me very pleased to see that configuration used as the example here.
I was fascinated with doing particle simulations in "C" language as a teenager in the late 1980s on my VGA monitor! I would do both gravity and charged particles.
Once particles accelerate they'll just 'jump by' each other of course rather than collide, if you have no repulsive forces. I realized you had to take smaller and smaller time-slices to get things to slow down and keep running when the singularities "got near". I had a theory even back then that this is what nature is doing with General Relativity making time slow down near masses. Slowing down to avoid singularities. It's a legitimate theory. Maybe this difficulty is why "God" (if there's a creator) decided to make everything actually "waves" as much as particles, because waves don't have the problem of singularities.
N-body problems are the gateway drug into numerical physics. Writing one from scratch is a rite of passage. Bonus points if you hit that sweet spot where performance doesn’t completely tank.
Every time I see anything on the N-Body problem I am reminded by my final high school project... I had 2-3 weeks of time to write an n-body simulation. Back then I used C++ and my hardware was really bad (2 GHz single core CPU or so…). The result was not really impressive because it did not really work. :D But I was able to show that my code correctly predicted that the moon and the earth would eventually collide without any initial velocity given to both bodies. I went into this totally naive and blind but it was a ton of fun.
> my hardware was really bad (2 GHz single core CPU or so…)
laughs in 32kb of RAM
Sounds like a great project, though. There's a lot of fundamental concepts involved in something like this, so it's a great learning exercise (and fun as well!) :)
Once you have the matrix implementation in Step 2 (Implementation 3) it's rather straightforward to extend your N-body simulator to run on a GPU with Jax --- you can just add `import jax.numpy as jnp` and replace all the `np.`s with `jnp`s.
For a few-body system (e.g., the Solar System) this probably won't provide any speedup. But once you get to ~100 bodies you should start to see substantial speedups by running the simulator on a GPU.
> rather straightforward
For the programmer, yes it is easy enough.
But there is a lot of complexity hidden behind that change. If you care about how your tools work that might be a problem (I'm not judging).
Oh for sure there is a ton that is going on behind the scenes when you substitute `np` --> `jnp`. But as someone who worked on gravitational dynamics a decade ago, it's really incredible that so much work has been done to make running numerical calculations on a GPU so straightforward for the programmer.
Obviously if you want maximum performance you'll probably still have to roll up your sleeves and write CUDA yourself, but there are a lot of situations where you can still get most of the benefit of using a GPU and never have to worry yourself over how to write CUDA.
People think high-level libraries are magic fairy dust - "abstractions! abstractions! abstractions!". But there's no such thing as an abstraction in real life, only heuristics.
Not true for regular GPUs like RTX5090. They have atrocious float64 performance compared to CPU. You need a special GPU designed for scientific computations (many float64 cores)
This is confusing: GPUs seem great for scientific computations. You often want f64 for scientific computation. GPUs aren't good with f64.
I'm trying to evaluate if I can get away with f32 for GPU use for my molecular docking software. Might be OK, but I've hit cases broadly where f64 is fine, but f32 is not.
I suppose this is because the dominant uses games and AI/ML use f32, or for AI even less‽
You need cards like the A100 / H100 / H200 / B200. GPUs aren’t fundamentally worse at f64. Nvidia just makes it worse in many cards for market segmentation.
Next logical optimization: Barnes Hut? Groups source bodies using a recursive tree of cubes. Gives huge speedups with high body counts. FMM is a step after, which also groups target bodies. Much more complicated to implement.
That's mentioned on the "Conclusions" page of TFA:
> Large-scale simulation: So far, we have only focused on systems with a few objects. What about large-scale systems with thousands or millions of objects? Turns out it is not so easy because the computation of gravity scales as . Have a look at Barnes-Hut algorithm to see how to speed up the simulation. In fact, we have documentations about it on this website as well. You may try to implement it in some low-level language like C or C++.
C or C++? Ha! I've implemented FMM in Fortran 77. (Not my choice; it was a summer internship and the "boss" wanted it that way.) It was a little painful.
Oh wow! I implemented BH in rust not long ago. Was straightforward. Set it up with grpahics so I could see the cubes etc. Then looked into FMM... I couldn't figure out where to start! Looked very formidable. multipole seems to be coming up in everything scientific I look at these days...
Supercomputers will simulate trillions of masses. The HACC code, commonly used to verify the performance of these machines, uses a uniform grid (interpolation and a 3D FFT) and local corrections to compute the motion of ~8 trillion bodies.
For comparison, the famous Debian language benchmark competition:
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
And for those who have access
2022 "N-Body Performance With a kD-Tree: Comparing Rust to Other Languages"
https://ieeexplore.ieee.org/document/10216574
2025 "Parallel N-Body Performance Comparison: Julia, Rust, and More"
https://link.springer.com/chapter/10.1007/978-3-031-85638-9_...
In Step 2 (Gravity), why are we summing over the cube of the distance between the bodies in the denominator?
Edit: To answer myself, I think this is because one of the factors is to normalize the vector between the two bodies to length 1, and the other two factors are the standard inverse square relationship.
You got it. The familiar inverse square formula uses a unit vector:
Oops, pretty big mistake on my part. I gave the formula for force and labeled it acceleration. Way too late to edit, but the correctly labeled formulas should have been:
Or as acceleration, by dividing out an `m` using `F=ma`:The force itself is `G * m1 * m2 / (r^2)`. That's a pure magnitude. The direction of the force is just the unit vector going from m1 to m2. You need it to be a unit vector or else you're multiplying up to something higher than that force. However, I don't get why you'd ever cube the 'r'. Never seen that. I don't think it's right, tbh.
> I don't get why you'd ever cube the 'r'.
It's pulled out of the unit vector. Might be more clear if I notated the vector bits a bit:
As you know, a vector is a magnitude and direction: So the formulas from before become (also correctly labeled as `F` per my other comment):Ok, I see what you're doing. Your multiplying the force vector by a non unit-vector, and then dividing back out the linear amount to correct for it. You never see this in a physics book because it's a computational hack, probably because it saves you the CPU cost of not having to do the 3 division operations it takes to get each component (X,Y,Z) of the unit vector.
This makes sense to do in computer code also because if you were going to raise r_mag to a power, you might as well raise it to 3 instead of 2, because it's not extra cost, but you do avoid the three divisions, by never calculating a unit vector. Back when I was doing this work, was decades ago and I had no idea about cost of floating points. Thanks for explaining!
Glad I could help!
Also fun is that taking the magnitude involves a square root that can sometimes be avoided, but that doesn't really help us here because of the power of three. If the denominator were squared we could just use `r_mag^2 = r_x^2 + r_y^2`, but we still need the root to get the direction. It is kinda interesting though that in 2d it expands to a power of `3/2`:
Yeah, on paper (or mathematical symbolics) it comes down to what's more clear and representing reality. That's why I initially said I know there's no cubic relations in the physics of this, which was correct.
But that doesn't mean that therefore there's no correct physics equations (for gravity) involving the cube of a distance, even when there's only squares in these "laws" of physics.
In both cases the power of 2, as well as 3/2, is there merely to "cancel out" the fact that you didn't use a unit vector (in the numerator) and therefore need to divide that out in the denominator, to end up scaling the force magnitude against a unit vector.
Also:
ParallelNBodyPerformance Public
https://github.com/MarkCLewis/ParallelNBodyPerformance
My favorite thing about this kind of code is that people are constantly inventing new techniques to do time integration. It's the sort of thing you'd think was a solved problem but then when you read about time integration strategies you realize how rich the space of solutions are. And they are more like engineering problems than pure math, with tradeoffs and better fits based on the kind of system.
Decades ago ... I think it was Computer Recreations column in "Scientific American" ... the strategy, since computers were less-abled then, was to advance all the bodies by some amount of time — call it a "tick" — when bodies got close, the "tick" got smaller: therefore the calculations more nuanced, precise. Further apart and you could run the solar system on generalities.
Adaptive time step RKF algorithm is explained in section 5:
https://alvinng4.github.io/grav_sim/5_steps_to_n_body_simula...
One way of doing that is by each time step delta perform two integrations, one which give you x_1, and another x_2, with different accuracies. The error is estimated by the difference |x_1 - x_2| and you make the error match your tolerance by adjusting the time step.
Naturally, this difference becomes big when two objects are close together, since the acceleration will induce a large change of velocity, and a lower time step is needed to keep the error under control.
I have run into the problem where a constant time step can suddenly result in bodies getting flung out of the simulation because they go very close.
Your solution sounds interesting, but isn't it only practical when you have a small number of bodies?
The suggested method of trying two time steps and comparing the accelerations does about twice as many calculations per simulation step, which don't require twice the time thanks to coherent memory access: a reasonable price to pay to ensure that every time step is small enough.
Optimistic fixed time steps are just going to work well almost always and almost everywhere, accumulating errors behind your back at every episode of close approach.
Yes, the author uses a globally-adaptive time stepper, which is only efficient for very small N. There are adaptive time step methods that are local, and those are used for large systems.
If you see bodies flung out after close passes, three solutions are available: reduce the time step, use a higher order time integrator, and (the most common method) add regularization. Regularization (often called "softening") removes the singularity by adding a constant to the squared distance. So 1 over zero becomes one over a small-ish and finite number.
>Regularization (often called "softening") removes the singularity by adding a constant to the squared distance. So 1 over zero becomes one over a small-ish and finite number.
IIRC that is what I did in the end. It is fudge, but it works.
It is a fudge if you really are trying to simulate true point masses. Mathematically, it's solving for the force between fuzzy blobs of mass.
You are never simulating pure anything. All computational models are wrong. Some are useful
Cut the time interval shorter and shorter, and then BAM you've independently discovered calculus.
You need to invent calculus to get partial differential equations. You need invent PDEs to get approximate solutions to PDEs. You need to invent approximate solutions to PDEs to get computational methods. You need to invent computational methods to explore time integration techniques.
So in this kind of simulation, as we look closer, uncertainty is reduced.
But in our simulation (reality, or whatever), uncertainty increases the closer we look.
Does that suggest we don’t live in a simulation? Or is “looking closer increases uncertainty” something that would emerge from a nested simulation?
The reason uncertainty goes up the closer we try to observe something in physics, is because everything is ultimately waves of specific wavelengths. So if you try to "zoom in" on one of the 'humps' of a sine wave, for example you don't see anything but a line that gets straighter and straighter, which is basically a loss of information. This is what Heisenberg Principle is about. Not the "Say my Name" one, the other one. hahaha.
And yes that's a dramatic over-simplification of uncertainty principle, but it conveys the concept perfectly.
They're more like engineering than pure math because pure math hasn't solved the n-body problem.
There are countless applications other than the n-body problem for which my point holds
Man I tried doing this once and my brain nearly melted lol props for actually making it work
Back in the early 1990s I was going through books in the college library and found one on numerical results of different 3-body starting configurations.
I vividly remember the Pythagorean three-body problem example, and how it required special treatment for the close interactions.
Which made me very pleased to see that configuration used as the example here.
Very well done!
I was fascinated with doing particle simulations in "C" language as a teenager in the late 1980s on my VGA monitor! I would do both gravity and charged particles.
Once particles accelerate they'll just 'jump by' each other of course rather than collide, if you have no repulsive forces. I realized you had to take smaller and smaller time-slices to get things to slow down and keep running when the singularities "got near". I had a theory even back then that this is what nature is doing with General Relativity making time slow down near masses. Slowing down to avoid singularities. It's a legitimate theory. Maybe this difficulty is why "God" (if there's a creator) decided to make everything actually "waves" as much as particles, because waves don't have the problem of singularities.
N-body problems are the gateway drug into numerical physics. Writing one from scratch is a rite of passage. Bonus points if you hit that sweet spot where performance doesn’t completely tank.
I built https://teskooano.space/ - so far I've not seen any big performance issues, and it's multiple bodies running happily at 120fps.
Now you're making me worry I'm not "physicsing" hard enough