Jaemin Cho
Jaemin Cho
Selected Publications
All Publications
CV
Light
Dark
Automatic
1
Visual Programming for Text-to-Image Generation and Evaluation
Interpretable/explainable visual programming frameworks for T2I generation (VPGen) and evaluation (VPEval) -
NeurIPS 2023
Jaemin Cho
,
Abhay Zala
,
Mohit Bansal
Preprint
Cite
Code
Project
Self-Chained Image-Language Model for Video Localization and Question Answering
To handle video QA, we self-chain BLIP-2 for 2-stage inference (localize+QA) & refining localization via QA feedback -
NeurIPS 2023
Shoubin Yu
,
Jaemin Cho
,
Prateek Yadav
,
Mohit Bansal
Preprint
Cite
Code
Paxion: Patching Action Knowledge in Video-Language Foundation Models
Analyzing and patching action knowledge in video-language models -
NeurIPS 2023
(Spotlight)
Zhenhailong Wang
,
Ansel Blume
,
Sha Li
,
Genglin Liu
,
Jaemin Cho
,
Zineng Tang
,
Mohit Bansal
,
Heng Ji
Preprint
Cite
Code
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models
Evaluation of Text-to-Image Generation Models in Reasoning Skills and Social Biases -
ICCV 2023
Jaemin Cho
,
Abhay Zala
,
Mohit Bansal
Preprint
Cite
Code
Hierarchical Video-Moment Retrieval and Step-Captioning
HiREST is a holistic, hierarchical benchmark of multimodal retrieval and step-by-step summarization for a video corpus -
CVPR 2023
Abhay Zala
,
Jaemin Cho
,
Satwik Kottur
,
Xilun Chen
,
Barlas Oğuz
,
Yasahar Mehdad
,
Mohit Bansal
Preprint
Cite
Code
Project
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Efficient VL modeling with Perceiver-based iterative cross-attentions -
WACV 2023
Zineng Tang
,
Jaemin Cho
,
Jie Lei
,
Mohit Bansal
Preprint
Cite
Code
TVLT: Textless Vision-Language Transformer
Vision-and-Language modeling without text, by using a transformer which takes only raw visual and audio inputs -
NeurIPS 2022
(Oral)
Zineng Tang
,
Jaemin Cho
,
Yixin Nie
,
Mohit Bansal
Preprint
Cite
Code
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning
LST brings Memory efficiency into Parameter-efficient transfer learning -
NeurIPS 2022
Yi-Lin Sung
,
Jaemin Cho
,
Mohit Bansal
Preprint
Cite
Code
Fine-grained Image Captioning with CLIP Reward
CLIP as reward function for fine-grained image captioning -
Findings of NAACL 2022
Jaemin Cho
,
Seunghyun Yoon
,
Ajinkya Kale
,
Franck Dernoncourt
,
Trung Bui
,
Mohit Bansal
Preprint
Cite
Code
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
Adapter-based Parameter-Efficient Training for V&L tasks -
CVPR 2022
Yi-Lin Sung
,
Jaemin Cho
,
Mohit Bansal
Preprint
Cite
Code
«
»
Cite
×