4 Machine-learned interatomic potentials
4.1 Motivation
Classical potentials are fast and scale well, but require choosing a functional form and fitting parameters — the accuracy is bounded by these choices. DFT is transferable and does not assume a functional form, but the computational cost limits it to hundreds of atoms and short timescales.
Machine-learned interatomic potentials sit between these. Like classical potentials, they are parameterised models fitted to data. But unlike classical potentials, they do not assume a functional form for the interactions — the model learns the shape of \(E(r)\) from training data. And the training data typically comes from DFT, so the model inherits DFT’s accuracy for systems similar to those it was trained on.
4.2 How MLIPs work
The principle is supervised learning. A dataset of atomic configurations is generated, with energies and forces calculated using DFT. A model — typically a neural network — is then trained to predict \(E(r)\) from atomic positions. Once trained, evaluating the model is much cheaper than running DFT, enabling large systems and long simulations.
Most MLIPs decompose the total energy into contributions from each atom:
\[E = \sum_i \varepsilon_i\]
where \(\varepsilon_i\) depends on the local environment around atom \(i\) — the species and positions of nearby atoms within some cutoff. The model learns a mapping from local environment to atomic energy contribution.
This locality assumption is physically motivated: in most materials, an atom’s energy contribution depends primarily on its immediate surroundings. It also makes the model size-transferable — a model trained on small cells can be applied to larger systems.
There is considerable research into how local environments should be represented (descriptors, symmetry functions) and what model architectures work best (neural networks, Gaussian processes, equivariant networks). These details are beyond our scope. The key point is that MLIPs are flexible models with many parameters, fitted to reproduce DFT energies and forces.
4.3 The landscape
The MLIP field is evolving rapidly. Broadly, there are three approaches.
Purpose-trained potentials are models developed specifically for a particular system — a battery cathode material, a solid electrolyte, a class of alloys. The training data is generated for that system, covering the configurations relevant to the intended application. This typically gives the best accuracy, because the model is focused on what matters. But it requires substantial effort: generating training data means running many DFT calculations, which is expensive. Training the model and validating it carefully adds further work. This is analogous to developing a classical potential, but without assuming a functional form.
Foundation models take a different approach. These are pre-trained on massive datasets spanning much of the periodic table and many structure types, then released for general use. Training such a model requires enormous computational resources — work done by large research groups or companies (Google, Meta, Microsoft) with access to that scale of compute. The result is a model that provides reasonable accuracy out of the box for many systems, without needing to generate training data or train anything. Examples include MACE-MP, CHGNet, SevenNet, and others. Because foundation models have seen diverse training data, they tend to be more transferable across different chemistries. The tradeoff is that accuracy for any specific system may be lower than a purpose-trained model would achieve.
Fine-tuning offers a middle path. Starting from a foundation model, additional training data for a specific system is added and training continues. The foundation model provides a starting point — it already captures something about interatomic interactions in general — and fine-tuning adapts it to a particular chemistry. This can give good accuracy with less effort than training from scratch.
Which approach was used affects how much the results should be trusted. A purpose-trained potential with careful validation is more reliable than a foundation model applied to a system far from its training data.
4.4 Strengths and limitations
MLIPs learn the shape of \(E(r)\) from data, rather than assuming a functional form. This avoids the limitations of classical potentials, where accuracy is bounded by the mathematical form chosen. Once trained, evaluation is fast — approaching classical potential speeds for some architectures. The approach is also systematically improvable: more training data generally means a better model. If the potential fails in some region of configuration space, more data can be added there and the model retrained. Foundation models have also made the approach more accessible — reasonable results are possible without training anything.
The limitations are significant. The model can only be as good as its training data. Errors in the DFT calculations propagate to the potential, and the model inherits the limitations of whichever functional was used — if PBE gets something wrong, so will the MLIP trained on PBE data.
More fundamentally, extrapolation is dangerous. MLIPs interpolate well within their training domain but can fail badly outside it. Unlike DFT, which will give some answer for any configuration, an MLIP may give confident but wrong predictions for configurations far from training data. This failure can be silent — the model does not know it is extrapolating.
MLIPs are also less interpretable than classical potentials. A Buckingham potential has parameters with physical meaning — charges, repulsion coefficients. An MLIP has thousands of internal parameters with no direct interpretation. When something goes wrong, diagnosing why is harder. This places a validation burden on the user: the potential must be checked for accuracy on the configurations that matter.
4.5 When to use
MLIPs are increasingly the default choice when larger systems or longer timescales than DFT allows are needed, when classical potentials do not exist for the system or are not accurate enough, and when careful validation is feasible. They are the wrong choice when DFT is affordable for the problem (simpler, no training overhead), when the configurations of interest are far from any available training data, or when guaranteed reliability with no possibility of silent failure is required.
The field is moving fast. What is state-of-the-art now may be superseded within a year. When evaluating computational work that uses MLIPs, the key questions are: what potential was used, what was it trained on, and how was it validated for the system being studied?