Jaemin Cho
Jaemin Cho
Publications
CV
Light
Dark
Automatic
1
DOCCI: Descriptions of Connected and Contrasting Images
High-quality, long, human-annotated descriptions of 15K images -
ECCV 2024
Yasumasa Onoe
,
Sunayana Rane
,
Zachary Berger
,
Yonatan Bitton
,
Jaemin Cho
,
Roopal Garg
,
Alexander Ku
,
Zarana Parekh
,
Jordi Pont-Tuset
,
Garrett Tanzer
,
Su Wang
,
Jason Baldridge
Preprint
Cite
Dataset
Project
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
CRG is a training-free method that guides VLMs to help understand the visual prompts, by contrasting the outputs with & without visual prompts. -
ECCV 2024
David Wan
,
Jaemin Cho
,
Elias Stengel-Eskin
,
Mohit Bansal
Preprint
Cite
Code
Project
An Assessment of Reported Biases and Harms of Large Language Models
Analysis of how biases and harms are reported and understood in recent LLM papers -
ICA 2024
(Top Paper Award)
Heesoo Jang
,
Jaemin Cho
PDF
Cite
Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation
a New Diagnostic Benchmark (LayoutBench) and a new Baseline model (IterInpaint) for Layout-Guided Image Generation -
CVPR Workshop 2024
(Oral)
Jaemin Cho
,
Linjie Li
,
Zhengyuan Yang
,
Zhe Gan
,
Lijuan Wang
,
Mohit Bansal
Preprint
Cite
Code
Dataset
Project
Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts
Combining the best of both of specialist (e.g., SimpleClick) and generalist (e.g., SAM) designs to achieve all of low latency, high quality, and diverse prompts -
CVPR 2024
Qin Liu
,
Jaemin Cho
,
Mohit Bansal
,
Marc Niethammer
Preprint
Cite
Code
Project
Davidsonian Scene Graph: Improving Reliability in Fine-Grained Evaluation for Text-to-Image Generation
Reliable QG/A framework for T2I Evaluation based on Davidsonian Semantics -
ICLR 2024
Jaemin Cho
,
Yushi Hu
,
Roopal Garg
,
Peter Anderson
,
Ranjay Krishna
,
Jason Baldridge
,
Mohit Bansal
,
Jordi Pont-Tuset
,
Su Wang
Preprint
Cite
Code
Project
Visual Programming for Text-to-Image Generation and Evaluation
Interpretable/explainable visual programming frameworks for T2I generation (VPGen) and evaluation (VPEval) -
NeurIPS 2023
Jaemin Cho
,
Abhay Zala
,
Mohit Bansal
Preprint
Cite
Code
Project
Self-Chained Image-Language Model for Video Localization and Question Answering
To handle video QA, we self-chain BLIP-2 for 2-stage inference (localize+QA) & refining localization via QA feedback -
NeurIPS 2023
Shoubin Yu
,
Jaemin Cho
,
Prateek Yadav
,
Mohit Bansal
Preprint
Cite
Code
Paxion: Patching Action Knowledge in Video-Language Foundation Models
Analyzing and patching action knowledge in video-language models -
NeurIPS 2023
(Spotlight)
Zhenhailong Wang
,
Ansel Blume
,
Sha Li
,
Genglin Liu
,
Jaemin Cho
,
Zineng Tang
,
Mohit Bansal
,
Heng Ji
Preprint
Cite
Code
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models
Evaluation of Text-to-Image Generation Models in Reasoning Skills and Social Biases -
ICCV 2023
Jaemin Cho
,
Abhay Zala
,
Mohit Bansal
Preprint
Cite
Code
«
»
Cite
×