Combining Yolo and Sift to Detect Confusing Objects in Images
DOI:
https://doi.org/10.51173/ijds.v2i2.35Keywords:
Object Detection, YOLOv8, Computer Vision, SIFT, Image AnalysisAbstract
Object discovery has advanced significantly with the emergence of deep learning models; however, existing algorithms often fail to deliver highly accurate and feature-focused detection, particularly in challenging visual environments. This study addresses the limitation by proposing a novel framework that integrates the high-level detection capabilities of YOLOv8 with the precision-focused characteristics of SURF-based feature extraction (referred to here as "Surzet"). The proposed method leverages YOLOv8 to perform comprehensive object detection while utilizing Surzet to enhance the identification of fine-grained features and local properties, ensuring robustness to scale and rotation. Experimental evaluation on complex image datasets revealed that this hybrid model significantly outperforms YOLOv8 alone, showing higher detection accuracy and a noticeable reduction in false positives. The initial results demonstrate that integrating YOLOv8 with Surzet creates a more reliable and precise object discovery framework. This approach holds great promise for high-level detection and detailed feature recognition applications.
Downloads
References
S. Choudhary, S. Pareyani, and S. Kourav, "A Real-Time Local Binary Pattern-based Face Recognition System on Open CV using Boosted Cascade of Simple Features," Proceedings - 2024 13th IEEE International Conference on Communication Systems and Network Technologies, CSNT 2024, pp. 856–861, 2024, doi: 10.1109/CSNT60213.2024.10545885.
M. M. Zahoor et al., "Brain Tumor MRI Classification Using a Novel Deep Residual and Regional CNN," Biomedicines 2024, Vol. 12, No. 7, 1395, Jun. 2024, doi: 10.3390/biomedicines12071395.
C. R. Edwin Selva Rex, J. Annrose, and J. Jenifer Jose, "Comparative analysis of deep convolution neural network models on small scale datasets," Optik (Stuttg), vol. 271, p. 170238, Dec. 2022, doi: 10.1016/J.IJLEO.2022.170238.
R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 580–587, Sep. 2014, doi: 10.1109/cvpr.2014.81.
R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F. A. Wichmann, and W. Brendel, "ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness." Accessed: Jul. 24, 2025. [Online]. Available: https://github.com/rgeirhos/texture-vs-shape
Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, "Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression," Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, pp. 12993–13000, Apr. 2020, doi: 10.1609/aaai.v34i07.6999.
K. D. Toennies, "Feature Extraction by Convolutional Neural Network," An Introduction to Image Classification, pp. 169–186, 2024, doi: 10.1007/978-981-99-7882-3_8.
J. Wang, Q. M. Jonathan Wu, and N. Zhang, "You Only Look at Once for Real-Time and Generic Multi-Task," IEEE Trans Veh Technol, vol. 73, no. 9, pp. 12625–12637, 2024, doi: 10.1109/tvt.2024.3394350.
T. Lindeberg, "Scale Selection," Computer Vision, pp. 1–14, 2021, doi: 10.1007/978-3-030-03243-2_242-1.
T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature Pyramid Networks for Object Detection," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2017-January, pp. 936–944, Jul. 2017, doi: 10.1109/cvpr.2017.106.
J. Zhao, "Sports Motion Feature Extraction and Recognition Based on a Modified Histogram of Oriented Gradients with Speeded Up Robust Features," J Comput (Taipei), vol. 33, no. 1, pp. 63–70, 2022, doi: 10.53106/199115992022023301007.
S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Trans Pattern Anal Mach Intell, vol. 39, no. 6, pp. 1137–1149, Jun. 2017, doi: 10.1109/tpami.2016.2577031.
K. He, G. Gkioxari, P. Dollár, and R. Girshick, "Mask R-CNN.," IEEE Trans Pattern Anal Mach Intell, vol. 42, no. 2, pp. 386–397, Jun. 2018, doi: 10.1109/tpami.2018.2844175.
J. Redmon and A. Farhadi, "YOLO9000: Better, faster, stronger," Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, pp. 6517–6525, Nov. 2017, doi: 10.1109/cvpr.2017.690.
M. Fu et al., “Image Stitching Techniques Applied to Plane or 3-D Models: A Review,” IEEE Sens J, vol. 23, no. 8, pp. 8060–8079, Apr. 2023, doi: 10.1109/jsen.2023.3251661.
T. Diwan, G. Anirudh, and J. V. Tembhurne, "Object detection using YOLO: challenges, architectural successors, datasets and applications," Multimed Tools Appl, vol. 82, no. 6, pp. 9243–9275, Mar. 2023, doi: 10.1007/s11042-022-13644-y/tables/7.
Y. Amit, P. Felzenszwalb, and R. Girshick, "Object Detection," Computer Vision, pp. 875–883, 2021, doi: 10.1007/978-3-030-63416-2_660.
M. Bansal, M. Kumar, and M. Kumar, "2D object recognition: a comparative analysis of SIFT, SURF and ORB feature descriptors," Multimed Tools Appl, vol. 80, no. 12, pp. 18839–18857, May 2021, doi: 10.1007/s11042-021-10646-0/tables/10.
X. Xie, G. Cheng, J. Wang, X. Yao, and J. Han, "Oriented R-CNN for Object Detection," 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3500–3509, Oct. 2021, doi: 10.1109/iccv48922.2021.00350.
R. Gavrilescu, C. Zet, C. Fosalau, M. Skoczylas, and D. Cotovanu, "Faster R-CNN:an Approach to Real-Time Object Detection," EPE 2018 - Proceedings of the 2018 10th International Conference and Expositions on Electrical And Power Engineering, pp. 165–168, Dec. 2018, doi: 10.1109/icepe.2018.8559776.
"Comparative Study of Some Deep Learning Object Detection Algorithms: R-CNN, FAST R-CNN, FASTER R-CNN, SSD, and YOLO | Nile Journal of Engineering and Applied Science." Accessed: Jul. 24, 2025. [Online]. Available: https://www.nilejeas.com/index.php?mno=150264
P. Adarsh, P. Rathi, and M. Kumar, "YOLO v3-Tiny: Object Detection and Recognition using one stage improved model," 2020 6th International Conference on Advanced Computing and Communication Systems, ICACCS 2020, pp. 687–694, Mar. 2020, doi: 10.1109/icaccs48705.2020.9074315.
K. Li and L. Cao, "A review of object detection techniques," Proceedings - 2020 5th International Conference on Electromechanical Control Technology and Transportation, ICECTT 2020, pp. 385–390, May 2020, doi: 10.1109/icectt50890.2020.00091.
N. O'Mahony et al., "Deep Learning vs. Traditional Computer Vision," Advances in Intelligent Systems and Computing, vol. 943, pp. 128–144, 2020, doi: 10.1007/978-3-030-17795-9_10.
G. Neela, K. Babu, and V. J. Peter, "SKIN CANCER DETECTION USING SUPPORT VECTOR MACHINE WITH HISTOGRAM OF ORIENTED GRADIENTS FEATURES," ICTACT JOURNAL ON SOFT COMPUTING, p. 2, 2021, doi: 10.21917/ijsc.2021.0329.
D. K. Larasati, I. Setyawan, and A. A. Febrianto, "Automatic Stop Line Violations Detection using Histogram of Oriented Gradients and Support Vector Machine," ICOIACT 2022 - 5th International Conference on Information and Communications Technology: A New Way to Make AI Useful for Everyone in the New Normal Era, Proceeding, pp. 361–366, 2022, doi: 10.1109/icoiact55506.2022.9972237.
Z. Alshingiti, R. Alaqel, J. Al-Muhtadi, Q. E. U. Haq, K. Saleem, and M. H. Faheem, "A Deep Learning-Based Phishing Detection System Using CNN, LSTM, and LSTM-CNN," Electronics 2023, Vol. 12, Page 232, vol. 12, no. 1, p. 232, Jan. 2023, doi: 10.3390/electronics12010232.
C. Cao et al., “An Improved Faster R-CNN for Small Object Detection,” IEEE Access, vol. 7, pp. 106838–106846, 2019, doi: 10.1109/access.2019.2932731.
Y. Su, D. Li, and X. Chen, "Lung Nodule Detection based on Faster R-CNN Framework," Comput Methods Programs Biomed, vol. 200, p. 105866, Mar. 2021, doi: 10.1016/j.cmpb.2020.105866.
G. Ahmed Salman, A. Ghalib Ahmed, and H. Moaiad Hussen, "Medical Image Compression Utilizing The Serial Differences and Coding Techniques," InfoTech Spectrum: Iraqi Journal of Data Science , vol. 2, no. 1, pp. 26–36, Jan. 2025, doi: 10.51173/ijds.v2i1.13.
D. Yi, J. Su, and W. H. Chen, "Probabilistic faster R-CNN with stochastic region proposing: Towards object detection and recognition in remote sensing imagery," Neurocomputing, vol. 459, pp. 290–301, Oct. 2021, doi: 10.1016/j.neucom.2021.06.072.
H. Xiong, J. Wu, Q. Liu, and Y. Cai, "Research on abnormal object detection in specific region based on Mask R-CNN," Int J Adv Robot Syst, vol. 17, no. 3, May 2020, doi: 10.1177/1729881420925287.
T. Bai et al., "An Optimized Faster R-CNN Method Based on DRNet and RoI Align for Building Detection in Remote Sensing Images," Remote Sensing 2020, Vol. 12, Page 762, vol. 12, no. 5, p. 762, Feb. 2020, doi: 10.3390/rs12050762.
Z. Dahirou and M. Zheng, "Motion Detection and Object Detection: Yolo (You only Look Once)," Proceedings - 2021 7th Annual International Conference on Network and Information Systems for Computers, ICNISC 2021, pp. 250–257, 2021, doi: 10.1109/icnisc54316.2021.00053.
P. Jiang, D. Ergu, F. Liu, Y. Cai, and B. Ma, "A Review of Yolo Algorithm Developments," Procedia Comput Sci, vol. 199, pp. 1066–1073, Jan. 2022, doi: 10.1016/j.procs.2022.01.135.
N. Sasikala, V. Swathipriya, M. Ashwini, V. Preethi, A. Pranavi, and M. Ranjith, "Feature Extraction of Real-Time Image Using SIFT Algorithm," European Journal of Electrical Engineering and Computer Science, vol. 4, no. 3, May 2020, doi: 10.24018/ejece.2020.4.3.206.
S. Cho et al., “Dog Noseprint Identification Algorithm,” International Conference on Information Networking, vol. 2021-January, pp. 798–800, Jan. 2021, doi: 10.1109/icoin50884.2021.9333973.
Z. Hossein-Nejad, H. Agahi, and A. Mahmoodzadeh, "Image matching based on the adaptive redundant keypoint elimination method in the SIFT algorithm," Pattern Analysis and Applications, vol. 24, no. 2, pp. 669–683, May 2021, doi: 10.1007/s10044-020-00938-w/tables/6.
W. Burger and M. J. Burge, "Scale-Invariant Feature Transform (SIFT)," pp. 709–763, 2022, doi: 10.1007/978-3-031-05744-1_25.





