Daniel Klotz, Martin Gauch, Grey Nearing, Sepp Hochreiter, and Frederik Kratzert

The goal of this contribution is to demonstrate deficiencies that we observe in hydrological modelling studies. Our hope is that awareness of potential mistakes, errors, and habits will support accurate communication and analysis — and consequently lead to better modelling practices in our community.

By deficiencies, we broadly mean wrong assumptions, false conclusions, and artificial limitations that impair our modelling efforts. To give some explicit examples:

  • Model calibration: Often, only two data splits are used: one for model calibration and one for model validation. To provide a robust estimate of model quality on unseen data, one should, however, involve a three-way split: a calibration set used for parameter adaptation, a validation set used for hyperparameter tuning and intermediate model evaluations, and a test set used only once to derive the final, independent model accuracy.
  • Artificial restrictions: Studies often restrict modelling setups to specific settings (e.g., model classes, input data, or objective functions) for comparative reasons. In general, one should use the best available data, inputs, and objective functions for each model, irrespective of the diagnostic metric used for evaluation and irrespective of what other models are (able to) use.
  • (Missing) Model rejection: Although benchmarking efforts are not an entirely new concept in our community, we do observe that the results of model comparisons are seemingly without consequences. Models that repeatedly underperform on a specific task continue to be used for the same task they were just proven not to be good for. At some point, these models should be rejected and we as a community should move forward to improve the other models or develop new models.
  • Interpretation of intermediate states: Many hydrologic models attempt to represent a variety of internal physical states that are not calibrated (e.g., soil moisture). Unfortunately, these states are often mistaken for true measurements and used as ground truth in downstream studies. We believe that (unless the quality of these states was evaluated successfully), using intermediate model outputs is of high risk, as it may distort subsequent analyses.
  • Noise: Albeit it is commonly accepted that hydrological input variables are subject to large uncertainties and imprecisions, the influence of input perturbations is often not explicitly accounted for in models.
  • Model  complexity: We aim to model one of the most complex systems that exists, our nature. In practice, we will only be able to obtain a simplified representation of the system. However, we should not reduce complexity for the wrong reasons. While there is a tradeoff between simplicity and complexity, we should not tend towards the most simple models, such as two- or three-bucket models.

Our belief is that modelling should be a community-wide effort, involving benchmarking, probing, model building, and examination. Being aware of deficiencies will hopefully bring forth a culture that adheres to best practises, rigorous testing, and probing for errors — ultimately benefiting us all by leading to more performant and reliable models.

EGU22-12403, 2022-05-23.

View paper
IARAI Authors
Dr Sepp Hochreiter
Weather and Climate
Hydrology, Machine Learning


Imprint | Privacy Policy

Stay in the know with developments at IARAI

We can let you know if there’s any

updates from the Institute.
You can later also tailor your news feed to specific research areas or keywords (Privacy)

Log in with your credentials

Forgot your details?

Create Account