Aarhus Universitets segl

Research

Data-driven model discovery via symbolic regression

Computational modeling is a key resource to gather insight into physical systems in modern scientific research and engineering. While access to large amount of data has fuelled the use of Machine Learning (ML) to recover physical models from experiments and increase the accuracy of physical simulations, purely data-driven models have limited generalization and interpretability. To overcome these limitations, we propose a framework that combines Symbolic Regression (SR) and Discrete Exterior Calculus (DEC) for the automated discovery of physical models starting from experimental data. Since these models consist of mathematical expressions, they are interpretable and amenable to analysis, and the use of a natural, general-purpose discrete mathematical language for physics favors generalization with limited input data. Importantly, DEC provides building blocks for the discrete analogue of field theories, which are beyond the state-of-the-art applications of SR to physical problems. Further, we show that DEC allows to implement a strongly-typed SR procedure that guarantees the mathematical consistency of the recovered models and reduces the search space of symbolic expressions. Finally, we prove the effectiveness of our methodology by re-discovering three models of Continuum Physics from synthetic experimental data: Poisson equation, the Euler’s Elastica and the equations of Linear Elasticity. Thanks to their general-purpose nature, the methods developed in this paper may be applied to diverse contexts of physical modeling.

Key publications:

Software:

  • Flex: a Python library for high-performance symbolic regression
  • dctkit: a Python toolkit for discrete calculus

Data-driven symbolic closures for convection-dominated and turbulent flows

Data-driven closures correct the standard reduced order models (ROMs) to increase their accuracy in under-resolved, convection-dominated flows. There are two types of data-driven ROM closures in current use: (i) structural, with simple ansatzes (e.g., linear or quadratic); and (ii) machine learning-based, with neural network ansatzes. We propose a novel symbolic regression (SR) data-driven ROM closure strategy, which combines the advantages of current approaches and eliminates their drawbacks. As a result, the new data-driven SR closures yield ROMs that are parsimonious, accurate, generalizable, and robust.

Key publications:

Software:

Traffic flow modeling and forecasting

Traffic flows are complex systems that can be studied from a macroscopic perspective. In particular, first-order models are tractable but oversimplified, while higher-order models capture richer dynamics at the cost of complexity. Here, we introduce SR-Traffic, a data-driven, physics-informed framework that uses symbolic regression to learn effective phenomenological relations directly from experimental data while embedding them into an efficient, first-order PDE formulation. Our approach balances accuracy and interpretability, ensures physical consistency, and shows good generalization, overcoming the limitations of purely data-driven models. Overall, our findings could support the design of digital solutions for improving mobility systems.

Key publications:

Software: