Jaemin Cho
Jaemin Cho
Selected Publications
All Publications
CV
Light
Dark
Automatic
1
Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts
Combining the best of both of specialist (e.g., SimpleClick) and generalist (e.g., SAM) designs to achieve all of low latency, high quality, and diverse prompts -
CVPR 2024
Qin Liu
,
Jaemin Cho
,
Mohit Bansal
,
Marc Niethammer
Preprint
Cite
Code
Project
Davidsonian Scene Graph: Improving Reliability in Fine-Grained Evaluation for Text-to-Image Generation
Reliable QG/A framework for T2I Evaluation based on Davidsonian Semantics -
ICLR 2024
Jaemin Cho
,
Yushi Hu
,
Roopal Garg
,
Peter Anderson
,
Ranjay Krishna
,
Jason Baldridge
,
Mohit Bansal
,
Jordi Pont-Tuset
,
Su Wang
Preprint
Cite
Code
Project
Visual Programming for Text-to-Image Generation and Evaluation
Interpretable/explainable visual programming frameworks for T2I generation (VPGen) and evaluation (VPEval) -
NeurIPS 2023
Jaemin Cho
,
Abhay Zala
,
Mohit Bansal
Preprint
Cite
Code
Project
Self-Chained Image-Language Model for Video Localization and Question Answering
To handle video QA, we self-chain BLIP-2 for 2-stage inference (localize+QA) & refining localization via QA feedback -
NeurIPS 2023
Shoubin Yu
,
Jaemin Cho
,
Prateek Yadav
,
Mohit Bansal
Preprint
Cite
Code
Paxion: Patching Action Knowledge in Video-Language Foundation Models
Analyzing and patching action knowledge in video-language models -
NeurIPS 2023
(Spotlight)
Zhenhailong Wang
,
Ansel Blume
,
Sha Li
,
Genglin Liu
,
Jaemin Cho
,
Zineng Tang
,
Mohit Bansal
,
Heng Ji
Preprint
Cite
Code
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models
Evaluation of Text-to-Image Generation Models in Reasoning Skills and Social Biases -
ICCV 2023
Jaemin Cho
,
Abhay Zala
,
Mohit Bansal
Preprint
Cite
Code
Hierarchical Video-Moment Retrieval and Step-Captioning
HiREST is a holistic, hierarchical benchmark of multimodal retrieval and step-by-step summarization for a video corpus -
CVPR 2023
Abhay Zala
,
Jaemin Cho
,
Satwik Kottur
,
Xilun Chen
,
Barlas Oğuz
,
Yasahar Mehdad
,
Mohit Bansal
Preprint
Cite
Code
Project
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Efficient VL modeling with Perceiver-based iterative cross-attentions -
WACV 2023
Zineng Tang
,
Jaemin Cho
,
Jie Lei
,
Mohit Bansal
Preprint
Cite
Code
TVLT: Textless Vision-Language Transformer
Vision-and-Language modeling without text, by using a transformer which takes only raw visual and audio inputs -
NeurIPS 2022
(Oral)
Zineng Tang
,
Jaemin Cho
,
Yixin Nie
,
Mohit Bansal
Preprint
Cite
Code
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning
LST brings Memory efficiency into Parameter-efficient transfer learning -
NeurIPS 2022
Yi-Lin Sung
,
Jaemin Cho
,
Mohit Bansal
Preprint
Cite
Code
»
Cite
×