Zhang et al., 2019 - Google Patents
Object-aware aggregation with bidirectional temporal graph for video captioningZhang et al., 2019
View PDF- Document ID
- 2019054787462168485
- Author
- Zhang J
- Peng Y
- Publication year
- Publication venue
- Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
External Links
Snippet
Video captioning aims to automatically generate natural language descriptions of video content, which has drawn a lot of attention recent years. Generating accurate and fine- grained captions needs to not only understand the global content of video, but also capture …
- 230000002123 temporal effect 0 title abstract description 110
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6268—Classification techniques relating to the classification paradigm, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30781—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F17/30784—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre
- G06F17/30799—Information retrieval; Database structures therefor; File system structures therefor of video data using features automatically derived from the video content, e.g. descriptors, fingerprints, signatures, genre using low-level visual features of the video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/68—Methods or arrangements for recognition using electronic means using sequential comparisons of the image signals with a plurality of references in which the sequence of the image signals or the references is relevant, e.g. addressable memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/36—Image preprocessing, i.e. processing the image information without deciding about the identity of the image
- G06K9/46—Extraction of features or characteristics of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00221—Acquiring or recognising human faces, facial parts, facial sketches, facial expressions
- G06K9/00288—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30244—Information retrieval; Database structures therefor; File system structures therefor in image databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/00335—Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Zhang et al. | Object-aware aggregation with bidirectional temporal graph for video captioning | |
| Chen et al. | Towards bridging event captioner and sentence localizer for weakly supervised dense event captioning | |
| Zhang et al. | Video captioning with object-aware spatio-temporal correlation and aggregation | |
| Liu et al. | Spatial-temporal interaction learning based two-stream network for action recognition | |
| Wang et al. | Controllable video captioning with pos sequence guidance based on gated fusion network | |
| Aafaq et al. | Spatio-temporal dynamics and semantic attribute enriched visual encoding for video captioning | |
| Chen et al. | Less is more: Picking informative frames for video captioning | |
| Huang et al. | Attention on attention for image captioning | |
| Hou et al. | Joint syntax representation learning and visual cue translation for video captioning | |
| Xu et al. | Video question answering via gradually refined attention over appearance and motion | |
| Yao et al. | Exploring visual relationship for image captioning | |
| Jiang et al. | SoccerDB: A large-scale database for comprehensive video understanding | |
| Shen et al. | Weakly supervised dense video captioning | |
| Xu et al. | Msr-vtt: A large video description dataset for bridging video and language | |
| Hu et al. | Hierarchical global-local temporal modeling for video captioning | |
| Bin et al. | Bidirectional long-short term memory for video description | |
| Yin et al. | A survey of video-based human action recognition in team sports | |
| Qi et al. | Cross-modal Bidirectional Translation via Reinforcement Learning. | |
| Fu et al. | Embodied one-shot video recognition: Learning from actions of a virtual embodied agent | |
| Zhu et al. | Attention-based densely connected LSTM for video captioning | |
| Heilbron et al. | What do i annotate next? an empirical study of active learning for action localization | |
| Zanfir et al. | Spatio-temporal attention models for grounded video captioning | |
| Bin et al. | Adaptively attending to visual attributes and linguistic knowledge for captioning | |
| Wang et al. | Video event detection using motion relativity and feature selection | |
| CN118627582A (en) | Method, system and medium for model training |