模拟视皮层功能柱的动作识别神经网络_新加坡聚知刊出版有限公司

模拟视皮层功能柱的动作识别神经网络
朱后颖陈新欣方璐璐严睿恒刘海华通信作者

中南民族大学生物医学工程学院，武汉，430074；

摘要：大脑视皮层存在大量具有相似功能神经元组成的功能柱，其作为处理视觉信息的基本单元，在完成各种视觉任务中发挥重要作用.为此，提出了模拟视皮层方位功能柱结构的卷积神经网络模型（FCNet），并将其应用于人体动作识别任务.该网络模型模拟视皮层功能柱生物学特性，利用三维时空Gabor滤波器构建了计算功能柱；以CNN网络为骨架，计算功能柱为卷积核组，构建了动作识别神经网络；该网络采用直接分裂式前馈连接方式，提取视频中的时空特征，完成动作识别任务.通过在KTH和UCF101等公共动作识别数据集上进行实验，其结果表明，FCNet在动作识别准确率和效率方面显著优于其他卷积神经网络模型.其中，在KTH数据集上的分类准确率高达92.93%，在UCF101数据集上分类准确率达到90.04%，而与其他模型相比减少大量的参数量和计算代价.

关键词：3D Gabor滤波器；方向选择性；视觉功能柱；动作识别

参考文献

[1] BARSHOOI A H, AMIRKHANI A. A novel data augmentation based on Gabor filter and convolutional deep learning for improving the classification of COVID-19 chest X-Ray images[J]. Biomedical Signal Processing and Control, 2022, 72: 103326.

[2] ALEKSEEV A, BOBE A. GaborNet: Gabor filters with learnable parameters in deep convolutional neural network[C]//IEEE. 2019 International Conference on Engineering and Telecommunication. Dolgoprudny: IEEE, 2019: 1-4.

[3] RAJPUT S S, CHOI Y. Handwritten digit recognition using Convolution Neural Networks[C]//IEEE. 2022 IEEE 12th Annual Computing and Communication Workshop and Conference. Las Vegas: IEEE, 2022: 0163-0168.

[4] LUAN S, CHEN C, ZHANG B, et al. Gabor convolutional networks[J]. IEEE Transactions on Image Processing, 2018, 27(9): 4357-4366.

[5] YUAN Y, WANG L N, ZHONG G, et al. Adaptive Gabor convolutional networks[J]. Pattern Recognition, 2022, 124: 108495.

[6] FEICHTENHOFER C, FAN H, MALIK J, et al. SlowFast networks for video recognition[C]//IEEE. 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 6201-6211.

[7] TRAN D, BOURDEV L, FERGUS R, et al. Learning spatiotemporal features with 3d convolutional networks[C]//IEEE. 2015 IEEE International Conference on Computer Vision. Santiago:IEEE, 2015: 4489-4497.

[8] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//NIPS. 28th International Conference on Neural Information Processing Systems. Montreal: MIT Press, 2014: 568-576.

[9] CARREIRA J, ZISSERMAN A. Quo vadis, action recognition? A new model and the kinetics dataset[C]//IEEE. 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu:IEEE, 2017: 6299-6308.

[10] REHMAN Y A U, GAO Y, SHEN J, et al. Federated self-supervised learning for video understanding[C]// Springer. European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 506-522.

[11] LIN J, GAN C, HAN S. TSM: Temporal shift module for efficient video understanding[C]//IEEE. IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 7083-7093.

[12] QIU Z, YAO T, NGO C W, et al. Learning spatio-temporal representation with local and global diffusion[C]//IEEE. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 12056-12065.

[13] WANG L, XIONG Y, WANG Z, et al. Temporal segment networks: Towards good practices for deep action recognition[C]//Springer. European conference on computer vision. Cham: Springer International Publishing, 2016: 20-36.

[14] FEICHTENHOFER C. X3D: Expanding architectures for efficient video recognition[C]//IEEE. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 203-213.

[15] TAYLOR G W, FERGUS R, LECUN Y, et al. Convolutional learning of spatio-temporal features[C]// Springer. Computer Vision-ECCV 2010: 11th European Conference on Computer Vision. Berlin Heidelberg: Springer, 2010: 140-153.

[16] HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]//IEEE. IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 4700-4708.

[17] HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv:2017, 1704.04861.

[18] ZHANG X, ZHOU X, LIN M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//IEEE. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 6848-6856.

[19] LIU K, LIU W, GAN C, et al. T-C3D: Temporal convolutional 3D network for real-time action recognition[C]//AAAI. The Thirty-Second AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018, 32(1).

[20] PAN T, SONG Y, YANG T, et al. Videomoco: Contrastive video representation learning with temporally adversarial examples[C]//IEEE. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 11205-11214.

[21] HUBEL D H, WIESEL T N. Sequence regularity and geometry of orientation columns in the monkey striate cortex[J]. Journal of Comparative Neurology, 1974, 158(3): 267-293.

[22] PETKOV N, SUBRAMANIAN E. Motion detection, noise reduction, texture suppression, and contour enhancement by spatiotemporal Gabor filters with surround inhibition[J]. Biological Cybernetics, 2007, 97: 423-439.

[23] DAUGMAN J G. Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression[J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 2002, 36(7): 1169-1179.

[24] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.

[25] TAN H, LEI J, WOLF T, et al. Vimpac: Video pre-training via masked token prediction and contrastive learning[J].arXiv:2021, 2106.11250.

模拟视皮层功能柱的动作识别神经网络朱后颖 陈新欣 方璐璐 严睿恒 刘海华通信作者

模拟视皮层功能柱的动作识别神经网络
朱后颖陈新欣方璐璐严睿恒刘海华通信作者