This repo holds the code for Automatically Learning Hybrid Digital Twins of Dynamical Systems.
Digital Twins (DTs) are computational models that simulate the states and temporal dynamics of real-world systems, playing a crucial role in prediction, understanding, and decision-making across diverse domains. However, existing approaches to DTs often struggle to generalize to unseen conditions in data-scarce settings, a crucial requirement for such models. To address these limitations, our work begins by establishing the essential desiderata for effective DTs. Hybrid Digital Twins (HDTwins) represent a promising approach to address these requirements, modeling systems using a composition of both mechanistic and neural components. This hybrid architecture simultaneously leverages (partial) domain knowledge and neural network expressiveness to enhance generalization, with its modular design facilitating improved evolvability. While existing hybrid models rely on expert-specified architectures with only parameters optimized on data, automatically specifying and optimizing HDTwins remains intractable due to the complex search space and the need for flexible integration of domain priors. To overcome this complexity, we propose an evolutionary algorithm (HDTwinGen) that employs Large Language Models (LLMs) to autonomously propose, evaluate, and optimize HDTwins. Specifically, LLMs iteratively generate novel model specifications, while offline tools are employed to optimize emitted parameters. Correspondingly, proposed models are evaluated and evolved based on targeted feedback, enabling the discovery of increasingly effective hybrid models. Our empirical results reveal that HDTwinGen produces generalizable, sample-efficient, and evolvable models, significantly advancing DTs' efficacy in real-world applications.
To get started:
- Clone this repo
git clone https://github.com/samholt/HDTwinGen && cd ./HDTwinGen
- Follow the installation instructions in
setup/install.sh
to install the required packages.
./setup/install.sh
In the main terminal, perform the following steps:
- Modify the configuration files in folder
config
. The main config file that specifies baselines, datasets and other run parameters is inconfig/config.yaml
- Run
python run.py
to run all baselines on all datasets. This will generate a log file in thelogs
folder. - Once a run has completed, process the log file generated output into the
logs
folder, with the scriptprocess_result_file.py
. Note, you will need to edit theprocess_result_file.py
to read this generated log file, i.e., specify the path variable of where it is. This will generate the main tables as presented in the paper.
If you use our work in your research, please cite:
@inproceedings{
holt2024automatically,
title={Automatically Learning Hybrid Digital Twins of Dynamical Systems},
author={Samuel Holt and Tennison Liu and Mihaela van der Schaar},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=SOsiObSdU2}
}