There Are No Data Like More Data: Datasets for Deep Learning in Earth Observation
Michael Schmitt, Seyed Ali Ahmadi, Yonghao Xu, Gülşen Taşkın, Ujjwal Verma, Francescopaolo Sica, and Ronny Hänsch

An illustration of the proposed measures to characterize datasets.
Carefully curated and annotated datasets are the foundation of machine learning (ML), with particularly data-hungry deep neural networks forming the core of what is often called artificial intelligence ( AI ). Due to the massive success of deep learning (DL) applied to Earth observation (EO) problems, the focus of the community has been largely on the development of evermore sophisticated deep neural network architectures and training strategies. For that purpose, numerous task-specific datasets have been created that were largely ignored by previously published review articles on AI for EO. With this article, we want to change the perspective and put ML datasets dedicated to EO data and applications into the spotlight. Based on a review of historical developments, currently available resources are described and a perspective for future developments is formed. We hope to contribute to an understanding that the nature of our data is what distinguishes the EO community from many other communities that apply DL techniques to image data, and that a detailed understanding of EO data peculiarities is among the core competencies of our discipline.
IEEE Geoscience and Remote Sensing Magazine, 11, 3, 63-97, 2023-08-09.