Image Classification & Segmentation based on Enhanced CNN and Transformer Networks
Prasad Kulkarni
Bo Luo
Cuncong Zhong
Xinmai Yang
Guanghui Wang
Convolutional Neural Networks (CNNs) have significantly enhanced performance across various computer vision tasks such as image recognition and segmentation, owing to their robust representation capabilities. To further boost CNN performance, a self-attention module is integrated after each network layer. Transformer-based models, which leverage a multi-head self-attention module as their core component, have recently demonstrated outstanding performance. However, several challenges persist, including the limitation to class-specific channels in CNNs, the constrained receptive field in local transformers, and the incorporation of redundant features and the absence of multi-scale features in U-Net type segmentation architectures.
In our study, we propose new strategies to tackle these challenges. (1) We propose a novel channel-based self-attention module to diversify the focus more on the discriminative and significant channels, and the module can be embedded at the end of any backbone network for image classification. (2) To mitigate noise introduced by shallow encoder layers in U-Net architectures, we substitute skip connections with an Adaptive Global Context Module (AGCM). Additionally, we introduce the Semantic Feature Enhancement Module (SFEM) to enhance multi-scale features in polyp segmentation. (3) We introduce a Multi-scaled Overlapped Attention (MOA) mechanism within local transformer-based networks for image classification, facilitating the establishment of long-range dependencies and initiation of neighborhood window communication. (4) We propose a pioneering Fuzzy Attention Module designed to prioritize challenging pixels, thereby augmenting polyp segmentation performance. (5) We develop a novel dense attention gate module that aggregates features from all preceding layers to compute attention scores, refining global features in polyp segmentation tasks. Moreover, we design a new multi-layer horizontally extended decoder architecture to enhance local feature refinement in polyp segmentation.