Xun Liu, Danfeng Hong, Jocelyn Chanussot, Baojun Zhao, and Pedram Ghamisi
Modality translation has recently attracted a growing interest in the field of remote sensing. Owing to an increased availability of remote sensing data, modality translation has been widely used for the image processing tasks, including land cover classification and change detection. Modality translation learns a mapping to transform an image from a source to a target modality. For example, translating synthetic aperture radar (SAR) to optical images facilitates human interpretation and analysis of the original data. Unlike multimedia applications, modality translation in remote sensing often suffers from inherent ambiguities: due to insufficient information, a single input image may correspond to multiple possible outputs, which is problematic for reliable image interpretation.
To resolve these ambiguities, we propose a multi-modality image translation framework, which exploits temporal correlations in the image time series to obtain a more precise output. In this framework, we adopt a feature mask module to capture semantic information in a guidance image for translation. Given multiple temporal images, we formulate a uniqueness constraint in network learning, which increases reliability of the final result. We build a multi-modal dataset containing visible, SAR, and short-wave infrared (SWIR) image time series of the same scene. The dataset serves to promote research in modality translation in remote sensing and is publicly available. We use the dataset to perform experiments on modality translation tasks (SAR to visible and visible to SWIR). The results demonstrate the effectiveness and superiority of the proposed model.
IEEE Transactions on Geoscience and Remote Sensing, 2021-05-21.