中南民族大学 生物医学工程学院,武汉,430074;
摘要:大脑视皮层存在大量具有相似功能神经元组成的功能柱,其作为处理视觉信息的基本单元,在完成各种视觉任务中发挥重要作用.为此,提出了模拟视皮层方位功能柱结构的卷积神经网络模型(FCNet),并将其应用于人体动作识别任务.该网络模型模拟视皮层功能柱生物学特性,利用三维时空Gabor滤波器构建了计算功能柱;以CNN网络为骨架,计算功能柱为卷积核组,构建了动作识别神经网络;该网络采用直接分裂式前馈连接方式,提取视频中的时空特征,完成动作识别任务.通过在KTH和UCF101等公共动作识别数据集上进行实验,其结果表明,FCNet在动作识别准确率和效率方面显著优于其他卷积神经网络模型.其中,在KTH数据集上的分类准确率高达92.93%,在UCF101数据集上分类准确率达到90.04%,而与其他模型相比减少大量的参数量和计算代价.
关键词:3D Gabor滤波器;方向选择性;视觉功能柱;动作识别
参考文献
[1] BARSHOOI A H, AMIRKHANI A. A novel data augmentation based on Gabor filter and convolutional deep learning for improving the classification of COVID-19 chest X-Ray images[J]. Biomedical Signal Processing and Control, 2022, 72: 103326.
[2] ALEKSEEV A, BOBE A. GaborNet: Gabor filters with learnable parameters in deep convolutional neural network[C]//IEEE. 2019 International Conference on Engineering and Telecommunication. Dolgoprudny: IEEE, 2019: 1-4.
[3] RAJPUT S S, CHOI Y. Handwritten digit recognition using Convolution Neural Networks[C]//IEEE. 2022 IEEE 12th Annual Computing and Communication Workshop and Conference. Las Vegas: IEEE, 2022: 0163-0168.
[4] LUAN S, CHEN C, ZHANG B, et al. Gabor convolutional networks[J]. IEEE Transactions on Image Processing, 2018, 27(9): 4357-4366.
[5] YUAN Y, WANG L N, ZHONG G, et al. Adaptive Gabor convolutional networks[J]. Pattern Recognition, 2022, 124: 108495.
[6] FEICHTENHOFER C, FAN H, MALIK J, et al. SlowFast networks for video recognition[C]//IEEE. 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 6201-6211.
[7] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3d convolutional networks[C]//IEEE. 2015 IEEE International Conference on Computer Vision. Santiago:IEEE, 2015: 4489-4497.
[8] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//NIPS. 28th International Conference on Neural Information Processing Systems. Montreal: MIT Press, 2014: 568-576.
[9] CARREIRA J, ZISSERMAN A. Quo vadis, action recognition? A new model and the kinetics dataset[C]//IEEE. 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu:IEEE, 2017: 6299-6308.
[10] REHMAN Y A U, GAO Y, SHEN J, et al. Federated self-supervised learning for video understanding[C]// Springer. European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 506-522.
[11] LIN J, GAN C, HAN S. TSM: Temporal shift module for efficient video understanding[C]//IEEE. IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 7083-7093.
[12] QIU Z, YAO T, NGO C W, et al. Learning spatio-temporal representation with local and global diffusion[C]//IEEE. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 12056-12065.
[13] WANG L, XIONG Y, WANG Z, et al. Temporal segment networks: Towards good practices for deep action recognition[C]//Springer. European conference on computer vision. Cham: Springer International Publishing, 2016: 20-36.
[14] FEICHTENHOFER C. X3D: Expanding architectures for efficient video recognition[C]//IEEE. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 203-213.
[15] TAYLOR G W, FERGUS R, LECUN Y, et al. Convolutional learning of spatio-temporal features[C]// Springer. Computer Vision-ECCV 2010: 11th European Conference on Computer Vision. Berlin Heidelberg: Springer, 2010: 140-153.
[16] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//IEEE. IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 4700-4708.
[17] HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv:2017, 1704.04861.
[18] ZHANG X, ZHOU X, LIN M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//IEEE. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6848-6856.
[19] LIU K, LIU W, GAN C, et al. T-C3D: Temporal convolutional 3D network for real-time action recognition[C]//AAAI. The Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018, 32(1).
[20] PAN T, SONG Y, YANG T, et al. Videomoco: Contrastive video representation learning with temporally adversarial examples[C]//IEEE. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 11205-11214.
[21] HUBEL D H, WIESEL T N. Sequence regularity and geometry of orientation columns in the monkey striate cortex[J]. Journal of Comparative Neurology, 1974, 158(3): 267-293.
[22] PETKOV N, SUBRAMANIAN E. Motion detection, noise reduction, texture suppression, and contour enhancement by spatiotemporal Gabor filters with surround inhibition[J]. Biological Cybernetics, 2007, 97: 423-439.
[23] DAUGMAN J G. Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression[J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 2002, 36(7): 1169-1179.
[24] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
[25] TAN H, LEI J, WOLF T, et al. Vimpac: Video pre-training via masked token prediction and contrastive learning[J].arXiv:2021, 2106.11250.