Journal of South China University of Technology(Natural Science Edition) ›› 2018, Vol. 46 ›› Issue (1): 103-111.doi: 10.3969/j.issn.1000-565X.2018.01.014

• Computer Science & Technology • Previous Articles     Next Articles

Research on Video Description Based on Adaptive Frame Sampling Algorithm and Bidirectional Long Short-Term Memory

ZHANG Rongfeng NING Peiyang XIAO Huanhou SHI Jinglun QIU Wei   

  1. School of Electronic and Information Engineering,South China University of Technology
  • Received:2017-05-16 Revised:2017-06-18 Online:2018-01-25 Published:2017-12-01
  • Contact: 张荣锋( 1980-) ,男,博士生,主要从事机器学习和视频处理研究 E-mail:rongfzhang@qq.com
  • About author:张荣锋( 1980-) ,男,博士生,主要从事机器学习和视频处理研究
  • Supported by:
    The National Natural Science Foundation of China ( 61671213) 

Abstract: Video to text is a new challenging task in the field of computer vision. Focusing on this technical difficulty, this paper proposes an adaptive sampling algorithms and employs the Bidirectional Long-Short Term Memory (BLSTM) model and deep BLSTM based on the video features extracting by deep Convolutional Neural Networks. Since this doubly deep networks structure can learn the spatial and temporal correlation description of the videos, it is able to obtain the global dependency information from space and time domain. Experimental results showed that by using the datasets of M-VAD and MPII-MD, the proposed framework could achieve the average score of 7.8 and 9.1 in METEOR, respectively. Comparing to the original S2VT model, the proposed method outperformed 15.7% and 28.2% by average score and it also improved the descriptions of the videos.

Key words: video to text, adaptive frame sampling, bidirectional LSTM, deep convolutional neural networks, fusion information of frames.

CLC Number: