Aarhus Universitets segl

Research

Data-driven model discovery via symbolic regression

Computational modeling is a key resource to gather insight into physical systems in modern scientific research and engineering. While access to large amount of data has fueled the use of Machine Learning (ML) to recover physical models from experiments and increase the accuracy of physical simulations, purely data-driven models have limited generalization and interpretability. To overcome these limitations, we propose a framework that combines Symbolic Regression (SR) and Discrete Exterior Calculus (DEC) for the automated discovery of physical models starting from experimental data. Since these models consist of mathematical expressions, they are interpretable and amenable to analysis, and the use of a natural, general-purpose discrete mathematical language for physics favors generalization with limited input data. Importantly, DEC provides building blocks for the discrete analogue of field theories, which are beyond the state-of-the-art applications of SR to physical problems. Further, we show that DEC allows to implement a strongly-typed SR procedure that guarantees the mathematical consistency of the recovered models and reduces the search space of symbolic expressions. Finally, we prove the effectiveness of our methodology by re-discovering three models of Continuum Physics from synthetic experimental data: Poisson equation, the Euler’s Elastica and the equations of Linear Elasticity. Thanks to their general-purpose nature, the methods developed in this paper may be applied to diverse contexts of physical modeling.

Key publications:

Software:

  • AlpineGP: a Python library for symbolic regression of discrete physical models
  • dctkit: a Python toolkit for discrete calculus