Zhang H M, Yan D D, Tian Q Q. Improved spatio-temporal graph convolutional networks for video anomaly detection[J]. Opto-Electron Eng, 2024, 51(5): 240034. doi: 10.12086/oee.2024.240034
Citation: Zhang H M, Yan D D, Tian Q Q. Improved spatio-temporal graph convolutional networks for video anomaly detection[J]. Opto-Electron Eng, 2024, 51(5): 240034. doi: 10.12086/oee.2024.240034

Improved spatio-temporal graph convolutional networks for video anomaly detection

    Fund Project: Project supported by the National Natural Science Foundation of China (61901068), Chongqing Natural Science Foundation Top Project (cstc2021 jcyj-msxmX0525, CSTB2022NSCQ-MSX0786, CSTB2023NSCQ-MSX0911), and Science and Technology Research Project of Chongqing Municipal Education Commission (KJQN202201109)
More Information
  • An improved spatio-temporal graph convolutional network for video anomaly detection is proposed to accurately capture the spatio-temporal interactions of objects in anomalous events. The graph convolutional network integrates conditional random fields, effectively modeling the interactions between spatio-temporal features across frames and capturing their contextual relationship by exploiting inter-frame feature correlations. Based on this, a spatial similarity graph and a temporal dependency graph are constructed with video segments as nodes, facilitating the adaptive fusion of the two to learn video spatio-temporal features, thus improving the detection accuracy. Experiments were conducted on three video anomaly event datasets, UCSD Ped2, ShanghaiTech, and IITB-Corridor, yielding frame-level AUC values of 97.7%, 90.4%, and 86.0%, respectively, and achieving accuracy rates of 96.5%, 88.6%, and 88.0%, respectively.
  • 加载中
  • [1] 龚益玲, 张鑫昕, 陈松. 基于深度学习的视频异常检测研究综述[J]. 数据通信, 2023(3): 45−49. doi: 10.3969/j.issn.1002-5057.2023.03.012

    CrossRef Google Scholar

    Gong Y L, Zhang X X, Chen S. Survey on deep learning approach for video anomaly detection[J]. Data Commun, 2023(3): 45−49. doi: 10.3969/j.issn.1002-5057.2023.03.012

    CrossRef Google Scholar

    [2] Wang X G, Yan Y L, Tang P, et al. Revisiting multiple instance neural networks[J]. Pattern Recognit, 2018, 74: 15−24. doi: 10.1016/j.patcog.2017.08.026

    CrossRef Google Scholar

    [3] Zhou Z H, Sun Y Y, Li Y F. Multi-instance learning by treating instances as non-I. I. D. samples[C]//Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, 2009: 1249–1256. https://doi.org/10.1145/1553374.1553534.

    Google Scholar

    [4] 程稳, 陈忠碧, 李庆庆, 等. 时空特征对齐的多目标跟踪算法[J]. 光电工程, 2023, 50(6): 230009. doi: 10.12086/oee.2023.230009

    CrossRef Google Scholar

    Cheng W, Chen Z B, Li Q Q, et al. Multiple object tracking with aligned spatial-temporal feature[J]. Opto-Electron Eng, 2023, 50(6): 230009. doi: 10.12086/oee.2023.230009

    CrossRef Google Scholar

    [5] 李荆, 刘钰, 邹磊. 基于时空建模的动态图卷积神经网络[J]. 北京大学学报(自然科学版), 2021, 57(4): 605−613. doi: 10.13209/j.0479-8023.2021.052

    CrossRef Google Scholar

    Li J, Liu Y, Zou L. A dynamic graph convolutional network based on spatial-temporal modeling[J]. Acta Sci Nat Univ Pekins, 2021, 57(4): 605−613. doi: 10.13209/j.0479-8023.2021.052

    CrossRef Google Scholar

    [6] 吕佳, 王泽宇, 梁浩城. 边界注意力辅助的动态图卷积视网膜血管分割[J]. 光电工程, 2023, 50(1): 220116. doi: 10.12086/oee.2023.220116

    CrossRef Google Scholar

    Lv J, Wang Z Y, Liang H C. Boundary attention assisted dynamic graph convolution for retinal vascular segmentation[J]. Opto-Electron Eng, 2023, 50(1): 220116. doi: 10.12086/oee.2023.220116

    CrossRef Google Scholar

    [7] Sultani W, Chen C, Shah M. Real-world anomaly detection in surveillance videos[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, 2018: 6479–6488. https://doi.org/10.1109/CVPR.2018.00678.

    Google Scholar

    [8] Zhang J G, Qing L Y, Miao J. Temporal convolutional network with complementary inner bag loss for weakly supervised anomaly detection[C]//2019 IEEE International Conference on Image Processing (ICIP), Taipei, China, 2019: 4030–4034. https://doi.org/10.1109/ICIP.2019.8803657.

    Google Scholar

    [9] Li S, Liu F, Jiao L C. Self-training multi-sequence learning with transformer for weakly supervised video anomaly detection[C]//Proceedings of the 36th AAAI Conference on Artificial Intelligence, 2022: 1395–1403. https://doi.org/10.1609/aaai.v36i2.20028.

    Google Scholar

    [10] Liang W J, Zhang J M, Zhan Y Z. Weakly supervised video anomaly detection based on spatial–temporal feature fusion enhancement[J]. Signal, Image Video Process, 2024, 18(2): 1111−1118. doi: 10.1007/s11760-023-02828-0

    CrossRef Google Scholar

    [11] Kipf T N, Welling M. Semi-supervised classification with graph convolutional networks[C]//Proceedings of the 5th International Conference on Learning Representations, Toulon, 2017.

    Google Scholar

    [12] 周航, 詹永照, 毛启容. 基于时空融合图网络学习的视频异常事件检测[J]. 计算机研究与发展, 2021, 58(1): 48−59. doi: 10.7544/issn1000-1239202120200264

    CrossRef Google Scholar

    Zhou H, Zhan Y Z, Mao Q R. Video anomaly detection based on space-time fusion graph network learning[J]. J Comput Res Dev, 2021, 58(1): 48−59. doi: 10.7544/issn1000-1239202120200264

    CrossRef Google Scholar

    [13] Purwanto D, Chen Y T, Fang W H. Dance with self-attention: a new look of conditional random fields on anomaly detection in videos[C]//2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, 2021: 173–183. https://doi.org/10.1109/ICCV48922.2021.00024.

    Google Scholar

    [14] Mu H Y, Sun R Z, Wang M, et al. Spatio-temporal graph-based CNNs for anomaly detection in weakly-labeled videos[J]. Inf Process Manage, 2022, 59(4): 102983. doi: 10.1016/j.ipm.2022.102983

    CrossRef Google Scholar

    [15] Liu M T, Li X R, Liu Y G, et al. Weakly supervised anomaly detection with multi-level contextual modeling[J]. Multimedia Syst, 2023, 29(4): 2153−2164. doi: 10.1007/s00530-023-01093-y

    CrossRef Google Scholar

    [16] Cheng K, Zeng X H, Liu Y, et al. Spatial-temporal graph convolutional network boosted flow-frame prediction for video anomaly detection[C]//ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, 2023: 1–5. https://doi.org/10.1109/ICASSP49357.2023.10095170.

    Google Scholar

    [17] Li X B, Wang W, Li Q Y, et al. Spatial-temporal graph-guided global attention network for video-based person re-identification[J]. Mach Vision Appl, 2024, 35(1): 8. doi: 10.1007/s00138-023-01489-w

    CrossRef Google Scholar

    [18] Wan B Y, Fang Y M, Xia X, et al. Weakly supervised video anomaly detection via center-guided discriminative learning[C]//2020 IEEE International Conference on Multimedia and Expo (ICME), London, 2020: 1–6. https://doi.org/10.1109/ICME46284.2020.9102722.

    Google Scholar

    [19] Feng J C, Hong F T, Zheng W S. MIST: multiple instance self-training framework for video anomaly detection[C]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, 2021: 14004–14013. https://doi.org/10.1109/CVPR46437.2021.01379.

    Google Scholar

    [20] Lafferty J D, McCallum A, Pereira F C N. Conditional random fields: probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the Eighteenth International Conference on Machine Learning, San Francisco, 2001: 282–289.

    Google Scholar

    [21] Gao H C, Pei J, Huang H. Conditional random field enhanced graph convolutional neural networks[C]//Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, 2019: 276–284. https://doi.org/10.1145/3292500.3330888.

    Google Scholar

    [22] Krähenbühl P, Koltun V. Efficient inference in fully connected CRFs with Gaussian edge potentials[C]//Proceedings of the 24th International Conference on Neural Information Processing Systems, Granada, 2011: 109–117.

    Google Scholar

    [23] Zhang J W, Zhang X L, Zhu Z Q, et al. Efficient combination graph model based on conditional random field for online multi-object tracking[J]. Complex Intell Syst, 2023, 9(3): 3261−3276. doi: 10.1007/s40747-022-00922-3

    CrossRef Google Scholar

    [24] Chen D Y, Wang P T, Yue L Y, et al. Anomaly detection in surveillance video based on bidirectional prediction[J]. Image Vision Comput, 2020, 98: 103915. doi: 10.1016/j.imavis.2020.103915

    CrossRef Google Scholar

    [25] Lu C W, Shi J P, Jia J Y. Abnormal event detection at 150 FPS in MATLAB[C]//Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, 2013: 2720–2727. https://doi.org/10.1109/ICCV.2013.338.

    Google Scholar

    [26] Rodrigues R, Bhargava N, Velmurugan R, et al. Multi-timescale trajectory prediction for abnormal human activity detection[C]//Proceedings of the 2020 IEEE Winter Conference on Applications of Computer Vision, Snowmass, 2020: 2615–2623. https://doi.org/10.1109/WACV45572.2020.9093633.

    Google Scholar

    [27] Zhong J X, Li N N, Kong W J, et al. Graph convolutional label noise cleaner: train a plug-and-play action classifier for anomaly detection[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, 2019: 1237–1246. https://doi.org/10.1109/CVPR.2019.00133.

    Google Scholar

    [28] Hasan M, Choi J, Neumann J, et al. Learning temporal regularity in video sequences[C]//Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, 2016: 733–742. https://doi.org/10.1109/CVPR.2016.86.

    Google Scholar

    [29] Gong D, Liu L Q, Le V, et al. Memorizing normality to detect anomaly: memory-augmented deep autoencoder for unsupervised anomaly detection[C]//Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, 2019: 1705–1714. https://doi.org/10.1109/ICCV.2019.00179.

    Google Scholar

    [30] Yu G, Wang S Q, Cai Z P, et al. Cloze test helps: effective video anomaly detection via learning to complete video events[C]//Proceedings of the 28th ACM International Conference on Multimedia, Seattle, 2020: 583–591. https://doi.org/10.1145/3394171.3413973.

    Google Scholar

    [31] Taghinezhad N, Yazdi M. A new unsupervised video anomaly detection using multi-scale feature memorization and multipath temporal information prediction[J]. IEEE Access, 2023, 11: 9295−9310. doi: 10.1109/ACCESS.2023.3237028

    CrossRef Google Scholar

    [32] Tian Y, Pang G S, Chen Y H, et al. Weakly-supervised video anomaly detection with robust temporal feature magnitude learning[C]//Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, Montreal, 2021: 4955–4966. https://doi.org/10.1109/ICCV48922.2021.00493.

    Google Scholar

    [33] Chen H Y, Mei X, Ma Z Y, et al. Spatial–temporal graph attention network for video anomaly detection[J]. Image Vision Comput, 2023, 131: 104629. doi: 10.1016/j.imavis.2023.104629

    CrossRef Google Scholar

    [34] Wang L, Tian J W, Zhou S P, et al. Memory-augmented appearance-motion network for video anomaly detection[J]. Pattern Recognit, 2023, 138: 109335. doi: 10.1016/j.patcog.2023.109335

    CrossRef Google Scholar

    [35] Tur A O, Dall’Asen N, Beyan C, et al. Exploring diffusion models for unsupervised video anomaly detection[C]//2023 IEEE International Conference on Image Processing (ICIP), Kuala Lumpur, 2023: 2540–2544. https://doi.org/10.1109/ICIP49359.2023.10222594.

    Google Scholar

    [36] Acsintoae A, Florescu A, Georgescu M I, et al. UBnormal: new benchmark for supervised open-set video anomaly detection[C]//Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, 2022: 20111–20121. https://doi.org/10.1109/CVPR52688.2022.01951.

    Google Scholar

    [37] Zeng X L, Jiang Y L, Ding W R, et al. A hierarchical spatio-temporal graph convolutional neural network for anomaly detection in videos[J]. IEEE Trans Circuits Syst Video Technol, 2023, 33(1): 200−212. doi: 10.1109/TCSVT.2021.3134410

    CrossRef Google Scholar

    [38] Li J, Huang Q W, Du Y J, et al. Variational abnormal behavior detection with motion consistency[J]. IEEE Trans Image Process, 2022, 31: 275−286. doi: 10.1109/TIP.2021.3130545

    CrossRef Google Scholar

    [39] Cao C Q, Lu Y, Wang P, et al. A new comprehensive benchmark for semi-supervised video anomaly detection and anticipation[C]//Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, 2023: 20392–20401. https://doi.org/10.1109/CVPR52729.2023.01953.

    Google Scholar

    [40] Majhi S, Dai R, Kong Q, et al. Human-Scene Network: a novel baseline with self-rectifying loss for weakly supervised video anomaly detection[J]. Comput Vis Image Underst, 2024, 241: 103955. doi: 10.1016/j.cviu.2024.103955

    CrossRef Google Scholar

    [41] Markovitz A, Sharir G, Friedman I, et al. Graph embedded pose clustering for anomaly detection[C]//Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, 2020: 10536–10544. https://doi.org/10.1109/CVPR42600.2020.01055.

    Google Scholar

  • Video surveillance systems are increasingly widely used in public places and play an important role in maintaining social security and stability. However, the collection and labeling of anomalous videos are subject to subjective factors, resulting in video data containing only video-level labels and lacking detailed information, limiting the intelligent analysis of videos, especially in the field of anomaly detection, where richer data information is needed to improve model performance.

    Video data is typical spatio-temporal data, the spatio-temporal features shown by the abnormal events in the video have significant correlation, and the connection between the segments in the video can be constructed by introducing the graph structure in both time perspective and space perspective, but the traditional convolution operation can not be directly applied to the graph. Although Graph Convolutional Neural Network (GCN) can effectively process data with the graph structure, it is still deficient in capturing the intrinsic relationship between objects in neighbouring frames, especially in coping with the complex spatio-temporal dependencies between frames in a video sequence. To model the spatio-temporal correlations of video segments more reasonably under the graph structure, and then effectively detect and locate video anomalies, this paper proposes an improved video anomaly detection method with spatio-temporal graph convolutional networks. Each clip in the video is regarded as a node; two key graph models, a spatial similarity graph, and a temporal dependency graph are constructed. The video features are learned by adaptive fusion based on the consideration of spatio-temporal connections between clips. Since anomalous events can be formed through spatio-temporal interactions between multiple objects, taking advantage of the good graph modeling benefits of Conditional Random Field (CRF), a CRF layer is introduced into the GCN model to model the interactions between spatio-temporal features across frames to capture their contextual relationships, thus improving the detection accuracy of the model.

    Experiments were conducted on three video anomaly event datasets, including UCSD Ped2, ShanghaiTech, and IITB-Corridor. The frame-level AUC values reach 97.7%, 90.4%, and 86.0%, respectively, and the experimental results verify the effectiveness of the proposed method.

  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Figures(8)

Tables(6)

Article Metrics

Article views(781) PDF downloads(233) Cited by(0)

Access History

Other Articles By Authors

Article Contents

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint