Efficient and Effective Convolutional Neural Networks for Object Detection and Recognition
Prasad Kulkarni
Fengjun Li
Cuncong Zhong
Guanghui Wang
Haiyang Chao
With the development of Convolutional Neural Networks (CNNs), computer vision enters a new era and the performance of image classification, object detection, segmentation, and recognition has been significantly improved. Object detection, as one of the fundamental problems in computer vision, is a necessary component of many computer vision tasks, such as image and video understanding, object tracking, instance segmentation, etc. In object detection, we need to not only recognize all defined objects in images or videos but also localize these objects, making it difficult to perfectly realize in real-world scenarios.
In this work, we aim to improve the performance of object detection and localization by adopting more efficient and effective CNN models. (1) We propose an effective and efficient approach for real-time detection and tracking of small golf balls based on object detection and the Kalman filter. For this purpose, we have collected and labeled thousands of golf ball images to train the learning model. We also implemented several classical object detection models and compared their performance in terms of detection precision and speed. (2) To address the domain shift problem in object detection, we propose to employ generative adversarial networks (GANs) to generate new images in different domains and then concatenate the original RGB images and their corresponding GAN-generated fake images to form a 6-channel representation of the image content. (3) We propose a strategy to improve label assignment in modern object detection models. The IoU (Intersection over Union) thresholds between the pre-defined anchors and the ground truth bounding boxes are significant to the definition of the positive and negative samples. Instead of using fixed thresholds or adaptive thresholds based on statistics, we introduced the predictions into the label assignment paradigm to dynamically define positive samples and negative samples so that more high-quality samples could be selected as positive samples. The strategy reduces the discrepancy between the classification scores and the IoU scores and yields more accurate bounding boxes.