Skip to content

Latest commit

 

History

History
105 lines (69 loc) · 4.35 KB

Architectures.md

File metadata and controls

105 lines (69 loc) · 4.35 KB

Popular Architectures for Semantic Segmentation

Content

SegNet (2016)

The Upsampling layers in SegNet is performed based on the indexes from MaxPooling layers.

DeepLab

DeepLab V1 (2016)

DeepLab V1 is the first to introduce Atrous Convolution (shown as the left of the figure below), which is later viewed as the most outstanding property of the DeepLab series.

For the explanation of fully-connected CRF, the following equations shows the original explanation from the paper. As can be seen, the calculation of CRF model is based on its Energy function. The connection between two different pixel locations is based on Gaussian Kernel.

DeepLab V2 (2017)

Compared to DeepLab V1, DeepLab V2 introduces another practical method, called Atrous Spatial Pyramid Pooling (ASPP). The block is added to the end of the backbone and thus the performance of DeepLab V2 exceeds DeepLab V1. All other implementations including fully-connected CRF are kept the same.

DeepLab V3 (2017)

Starting from DeepLab V3, the fully-connected CRF is removed. Instead, Atrous Convolutional layers are fully explored. Different stride rates are applied in the entire model which the backbone itself can keep a certain large feature maps resolution.

Using Neural Architecture Search (NAS), the model is designed by Deep Learning itself and thus outperforms previous DeepLab versions.

U-Net

U-Net (2015)

The modern variations of U-Net usually adapt the upsampling layers to the preferences, e.g. Bilinear Interpolation, Nearest Interpolation, or Deconvolution.

In Attention U-Net, the Attention Blocks, shown as blow, are applied to the skip-connection stages. The g, which means gating, is the previously un-downsampled stage. The x, which is the downsampled stage, is upsampled before fed into the Attention Blocks.

TransUNet (2021)

Inspired by ViT, TransUNet added a Vision Transformer Block to the lowest-level feature maps to U-Net, and thus achieved higher performance.

Swin-Transformer is developed under the inspiration of ViT. Specifically, it defines a window-based transformer mechanism for reducing the model size.