Local Window Attention Transformer for Polarimetric SAR Image Classification
Ali Jamali, Swalpa Kumar Roy, Avik Bhattacharya, and Pedram Ghamisi
Convolutional neural networks (CNNs) have recently found great attention in image classification since deep CNNs have exhibited excellent performance in computer vision. Owing to their immense success, of late, scientists are exploring the functionality of transformers in Earth observation applications. Nevertheless, the primary issue with transformers is that they demand significantly more training data than CNN classifiers. Thus, the use of these transformers in remote sensing is considered challenging, notably in utilizing polarimetric SAR (PolSAR) data, due to the insufficient number of existing labeled data. In this letter, we develop and propose a vision transformer-based framework that utilizes 3D and 2D CNNs as feature extractors and, in addition, local window attention for the effective classification of PolSAR data. Extensive experimental results demonstrated that the developed model PolSARFormer obtained better classification accuracy than the state-of-the-art vision Swin Transformer and FNet algorithms. The PolSARFormer outperformed the Swin Transformer and FNet by the margin of 5.86% and 17.63%, in terms of average accuracy in the San Francisco data benchmark. Moreover, the results over the Flevoland dataset illustrated that the PolSARFormer exceeds several other algorithms, including the ResNet (97.49%), Swin Transformer (96.54%), FNet (95.28%), 2D CNN (94.57%), and AlexNet (91.83%), with a kappa index of 99.30%. The code will be available on GitHub.
IEEE Geoscience and Remote Sensing Letters, 2023-01-23.