Jaemin Cho
Jaemin Cho
Publications
CV
Light
Dark
Automatic
Vision and Language
Self-Chained Image-Language Model for Video Localization and Question Answering
To handle video QA, we self-chain BLIP-2 for 2-stage inference (localize+QA) & refining localization via QA feedback -
NeurIPS 2023
Shoubin Yu
,
Jaemin Cho
,
Prateek Yadav
,
Mohit Bansal
Preprint
Cite
Code
Paxion: Patching Action Knowledge in Video-Language Foundation Models
Analyzing and patching action knowledge in video-language models -
NeurIPS 2023
(Spotlight)
Zhenhailong Wang
,
Ansel Blume
,
Sha Li
,
Genglin Liu
,
Jaemin Cho
,
Zineng Tang
,
Mohit Bansal
,
Heng Ji
Preprint
Cite
Code
TVLT: Textless Vision-Language Transformer
Vision-and-Language modeling without text, by using a transformer which takes only raw visual and audio inputs -
NeurIPS 2022
(Oral)
Zineng Tang
,
Jaemin Cho
,
Yixin Nie
,
Mohit Bansal
Preprint
Cite
Code
Fine-grained Image Captioning with CLIP Reward
CLIP as reward function for fine-grained image captioning -
Findings of NAACL 2022
Jaemin Cho
,
Seunghyun Yoon
,
Ajinkya Kale
,
Franck Dernoncourt
,
Trung Bui
,
Mohit Bansal
Preprint
Cite
Code
MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding
A question answering benchmark on real-world news articles for multi-media and multi-hop reasoning -
AAAI 2022
Revanth Gangi Reddy
,
Xilin Rui
,
Manling Li
,
Xudong Lin
,
Haoyang Wen
,
Jaemin Cho
,
Lifu Huang
,
Mohit Bansal
,
Avi Sil
,
Shih-Fu Chang
,
Alexander Schiwing
,
Heng Ji
Preprint
Cite
Cite
×