Skip to content

This is a project about An Efficient Fusion Architecture via Stacked Residual Convolution Blocks and Transformer for Remote Sensing Image Semantic Segmentation

Notifications You must be signed in to change notification settings

js257/SRCBTFusion-Net

Repository files navigation

Introduction

Convolutional neural network (CNN) and Transformer-based self-attention models have their advantages in extracting local information and global semantic information, and it is a trend to design a model combining stacked residual convolution blocks (SRCB) and Transformer. How to efficiently integrate the two mechanisms to improve the segmentation effect of remote sensing (RS) images is an urgent problem to be solved. An efficient fusion via SRCB and Transformer (SRCBTFusion-Net) is proposed as a new semantic segmentation architecture for RS images. The SRCBTFusion-Net adopts an encoder-decoder structure, and the Transformer is embedded into SRCB to form a double coding structure, then the coding features are up-sampled and fused with multi-scale features of SRCB to form a decoding structure. Firstly, a semantic information enhancement module (SIEM) is proposed to get global clues for enhancing deep semantic information. Subsequently, the relationship guidance module (RGM) is incorporated to re-encode the decoder’s upsampled feature maps, enhancing the edge segmentation performance. Secondly, a multipath atrous self-attention module (MASM) is developed to enhance the effective selection and weighting of low-level features, effectively reducing the potential confusion introduced by the skip connections between low-level and high-level features. Finally, a multi-scale feature aggregation module (MFAM) is developed to enhance the extraction of semantic and contextual information, thus alleviating the loss of image feature information and improving the ability to identify similar categories. The proposed SRCBTFusion-Net’s performance on the Vaihingen and Potsdam datasets is superior to the state-of-the-art methods.

1. Data Preparation

1.1 Potsdam and Vaihingen Datasets

Our divided experimental Vaihingen dataset and Potsdam dataset (https://www.aliyundrive.com/s/VjRwXPLYedt)
Extraction code:l2x4
Then prepare the datasets in the following format for easy use of the code:

├── datasets
    ├── Postdam
    │   ├── origin
    │   ├── train
    │   │   ├── images
    │   │   ├── labels
    │   │   └── train_org.txt
    │   └── val
    │       ├── images
    │       ├── labels
    │       └── val_org.txt
    └── Vaihingen
        ├── origin
        ├── train
        │   ├── images
        │   ├── labels
        │   └── train_org.txt
        └── val
            ├── images
            ├── labels
            └── val_org.txt

2. Training

2.1 Pre-training weight

If you don't want to train, you can adopt the weights we trained on two datasets (https://pan.baidu.com/s/1VRXZ4uFhGcOZMmexmre4BA)
Extraction code: cfks

2.2 Start training and testing

python transformerCNN/train.py

3. Result:

Comparison of different methods in performance on Potsdam and Vaihingen Datasets:

Method

Params (M)

Speed (FPS)

Flops (G)

Potsdam

Vaihingen

MIoU (%)

MIoU (%)

TransUNet

76.77

19

15.51

76.86

74.30

ABCNet

28.57

26

7.24

74.89

70.55

Deeplabv3+

39.76

32

43.30

77.31

74.70

Swin-Unet

41.42

22

0.02

59.72

57.19

UNetformer

24.19

23

6.03

77.73

74.95

Segformer

84.59

18

11.65

77.54

75.23

SRCBTFusion-Net

86.30

28

22.58

78.62

76.27

image
Fig. 1. Examples of semantic segmentation results of different models on Potsdam dataset, the last column shows the predictions of our SRCBTFusion-Net, GT represents real label.

image
Fig. 2. Examples of semantic segmentation results of different models on Vaihingen dataset, the last column shows the predictions of our SRCBTFusion-Net, GT represents real label.

If you use our SRCBTFusion-Net, please cite our paper:

@ARTICLE{10328787,
  author={Chen, Junsong and Yi, Jizheng and Chen, Aibin and Lin, Hui},
  journal={IEEE Transactions on Geoscience and Remote Sensing}, 
  title={SRCBTFusion-Net: An Efficient Fusion Architecture via Stacked Residual Convolution Blocks and Transformer for Remote Sensing Image Semantic Segmentation}, 
  year={2023},
  volume={61},
  number={},
  pages={1-16},
  doi={10.1109/TGRS.2023.3336689}}

Requirement

Python 3.7.0+
Pytorch 1.8.2
CUDA 12.2
tqdm 4.63.0
numpy 1.21.6
ml-collections
collections
scipy
logging

About

This is a project about An Efficient Fusion Architecture via Stacked Residual Convolution Blocks and Transformer for Remote Sensing Image Semantic Segmentation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages