Computer Science & Technology

Research on Video Description Based on Adaptive Frame Sampling Algorithm and Bidirectional Long Short-Term Memory

Expand
  • School of Electronic and Information Engineering,South China University of Technology
张荣锋( 1980-) ,男,博士生,主要从事机器学习和视频处理研究

Received date: 2017-05-16

  Revised date: 2017-06-18

  Online published: 2017-12-01

Supported by

The National Natural Science Foundation of China ( 61671213) 

Abstract

Video to text is a new challenging task in the field of computer vision. Focusing on this technical difficulty, this paper proposes an adaptive sampling algorithms and employs the Bidirectional Long-Short Term Memory (BLSTM) model and deep BLSTM based on the video features extracting by deep Convolutional Neural Networks. Since this doubly deep networks structure can learn the spatial and temporal correlation description of the videos, it is able to obtain the global dependency information from space and time domain. Experimental results showed that by using the datasets of M-VAD and MPII-MD, the proposed framework could achieve the average score of 7.8 and 9.1 in METEOR, respectively. Comparing to the original S2VT model, the proposed method outperformed 15.7% and 28.2% by average score and it also improved the descriptions of the videos.

Cite this article

ZHANG Rongfeng NING Peiyang XIAO Huanhou SHI Jinglun QIU Wei . Research on Video Description Based on Adaptive Frame Sampling Algorithm and Bidirectional Long Short-Term Memory[J]. Journal of South China University of Technology(Natural Science), 2018 , 46(1) : 103 -111 . DOI: 10.3969/j.issn.1000-565X.2018.01.014

References

 
Outlines

/