Accurate and Robust Object Detection and Classification Based on Deep Neural Networks
Taejoon Kim
Fengjun Li
Bo Luo
Haiyang Chao
Guanghui Wang
Recent years have seen tremendous developments in the field of computer vision and its extensive applications. The fundamental task, image classification, benefiting from deep convolutional neural networks (CNN)'s extraordinary ability to extract deep semantic information from input data, has become the backbone for many other computer vision tasks, like object detection and segmentation. A modern detection usually has bounding-box regression and class prediction with a pre-trained classification model as the backbone. The architecture is proven to produce good results, however, improvements can be made with closer inspections. A detector takes a pre-trained CNN from the classification task and selects the final bounding boxes from multiple proposed regional candidates by a process called non-maximum suppression (NMS), which picks the best candidates by ranking their classification confidence scores. The localization evaluation is absent in the entire process. Another issue is the classification uses one-hot encoding to label the ground truth, resulting in an equal penalty for misclassifications between any two classes without considering the inherent relations between the classes. Ultimately, the realms of 2D image classification and 3D point cloud classification represent distinct avenues of research, each relying on significantly different architectures. Given the unique characteristics of these data types, it is not feasible to employ models interchangeably between them.
My research aims to address the following issues. (1) We proposed the first location-aware detection framework for single-shot detectors that can be integrated into any single-shot detectors. It boosts detection performance by calibrating the ranking process in NMS with localization scores. (2) To more effectively back-propagate gradients, we designed a super-class guided architecture that consists of a superclass branch (SCB) and a finer class branch (FCB). To further increase the effectiveness, the features from SCB with high-level information are fed to FCB to guide finer class predictions. (3) Recent works have shown 3D point cloud models are extremely vulnerable under adversarial attacks, which poses a serious threat to many critical applications like autonomous driving and robotic controls. To gap the domain difference in 3D and 2D classification and to increase the robustness of CNN models on 3D point cloud models, we propose a family of robust structured declarative classifiers for point cloud classification. We experimented with various 3D-to-2D mapping algorithm, bridging the gap between 2D and 3D classification. Furthermore, we empirically validate the internal constrained optimization mechanism effectively defend adversarial attacks through implicit gradients.