Leyuan Fang, Peng Zhou, Xinxin Liu, Pedram Ghamisi, and Siwei Chen
As the foundation of image interpretation, semantic segmentation is an active topic in the field of remote sensing. Facing the complex combination of multiscale objects existing in remote sensing images (RSIs), the exploration and modeling of contextual information have become the key to accurately identifying the objects at different scales. Although several methods have been proposed in the past decade, insufficient context modeling of global or local information, which easily results in the fragmentation of large-scale objects, the ignorance of small-scale objects, and blurred boundaries. To address the above issues, we propose a contextual representation enhancement network (CRENet) to strengthen the global context (GC) and local context (LC) modeling in high-level features. The core components of the CRENet are the local feature alignment enhancement module (LFAEM) and the superpixel affinity loss (SAL). The LFAEM aligns and enhances the LC in low-level features by constructing contextual contrast through multilayer cascaded deformable convolution and is then supplemented with high-level features to refine the segmentation map. The SAL assists the network to accurately capture the GC by supervising semantic information and relationship learned from superpixels. The proposed method is plug-and-play and can be embedded in any FCN-based network. Experiments on two popular RSI datasets demonstrate the effectiveness of our proposed network with competitive performance in qualitative and quantitative aspects.
IEEE Transactions on Neural Networks and Learning Systems, 2022-09-09.