Zhang et al., 2019 - Google Patents

Object-aware aggregation with bidirectional temporal graph for video captioning

Zhang et al., 2019

Document ID: 2019054787462168485
Author: Zhang J; Peng Y
Publication year: 2019
Publication venue: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

External Links

Cited by

Snippet

Video captioning aims to automatically generate natural language descriptions of video content, which has drawn a lot of attention recent years. Generating accurate and fine- grained captions needs to not only understand the global content of video, but also capture …

Continue reading at openaccess.thecvf.com (PDF) (other versions)

230000002123 temporal effect 0 title abstract description 110

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G06F17/30799—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre using low-level visual features of the video content
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00288—Classification, e.g. identification
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30244—Information retrieval; Database structures therefor; File system structures therefor in image databases
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00335—Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run

Similar Documents

Publication	Publication Date	Title
Zhang et al.	2019	Object-aware aggregation with bidirectional temporal graph for video captioning
Chen et al.	2021	Towards bridging event captioner and sentence localizer for weakly supervised dense event captioning
Zhang et al.	2020	Video captioning with object-aware spatio-temporal correlation and aggregation
Liu et al.	2022	Spatial-temporal interaction learning based two-stream network for action recognition
Wang et al.	2019	Controllable video captioning with pos sequence guidance based on gated fusion network
Aafaq et al.	2019	Spatio-temporal dynamics and semantic attribute enriched visual encoding for video captioning
Chen et al.	2018	Less is more: Picking informative frames for video captioning
Huang et al.	2019	Attention on attention for image captioning
Hou et al.	2019	Joint syntax representation learning and visual cue translation for video captioning
Xu et al.	2017	Video question answering via gradually refined attention over appearance and motion
Yao et al.	2018	Exploring visual relationship for image captioning
Jiang et al.	2020	SoccerDB: A large-scale database for comprehensive video understanding
Shen et al.	2017	Weakly supervised dense video captioning
Xu et al.	2016	Msr-vtt: A large video description dataset for bridging video and language
Hu et al.	2019	Hierarchical global-local temporal modeling for video captioning
Bin et al.	2016	Bidirectional long-short term memory for video description
Yin et al.	2024	A survey of video-based human action recognition in team sports
Qi et al.	2018	Cross-modal Bidirectional Translation via Reinforcement Learning.
Fu et al.	2019	Embodied one-shot video recognition: Learning from actions of a virtual embodied agent
Zhu et al.	2019	Attention-based densely connected LSTM for video captioning
Heilbron et al.	2018	What do i annotate next? an empirical study of active learning for action localization
Zanfir et al.	2016	Spatio-temporal attention models for grounded video captioning
Bin et al.	2017	Adaptively attending to visual attributes and linguistic knowledge for captioning
Wang et al.	2014	Video event detection using motion relativity and feature selection
CN118627582A (en)	2024-09-10	Method, system and medium for model training