1

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

A plug-and-play framework that reuses any existing ControlNet for any video/image diffusion model

Han Lin*, Jaemin Cho*, Abhay Zala, Mohit Bansal

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback

A new testbed of teacher environments for data generation agents for diverse tasks.

Zaid Khan, Elias Stengel-Eskin, Jaemin Cho, Mohit Bansal

SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

SELMA improves T2I models by fine-tuning on automatically generated multi-skill image-text datasets, with skill-specific LoRA expert learning & merging. - NeurIPS 2024

Jialu Li*, Jaemin Cho*, Yi-Lin Sung, Jaehong Yoon, Mohit Bansal

SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning

Using LLM (GPT-4) to generate a ‘diagram plan’ for fine-grained layouts (object/text labels/arrows, etc.) and render in either raster images (via diffusion) and vector graphics (via PowerPoint / Inkscape or any tools). - COLM 2024

Abhay Zala, Han Lin, Jaemin Cho, Mohit Bansal

EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents

EnvGen is a novel framework that uses LLMs to adaptively create training environments to help smaller embodied RL agents learn useful skills that they are weak at. - COLM 2024

Abhay Zala*, Jaemin Cho*, Han Lin, Jaehong Yoon, Mohit Bansal

VideoDirectorGPT: Consistent Multi-Scene Video Generation via LLM-Guided Planning

Using LLM (GPT-4) to generate a ‘video plan’ for consistent multi-scene video generation - COLM 2024

Han Lin, Abhay Zala, Jaemin Cho, Mohit Bansal

VideoDirectorGPT: Consistent Multi-Scene Video Generation via LLM-Guided Planning

DOCCI: Descriptions of Connected and Contrasting Images

High-quality, long, human-annotated descriptions of 15K images - ECCV 2024

Yasumasa Onoe, Sunayana Rane, Zachary Berger, Yonatan Bitton, Jaemin Cho, Roopal Garg, Alexander Ku, Zarana Parekh, Jordi Pont-Tuset, Garrett Tanzer, Su Wang, Jason Baldridge

DOCCI: Descriptions of Connected and Contrasting Images

Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training

CRG is a training-free method that guides VLMs to help understand the visual prompts, by contrasting the outputs with & without visual prompts. - ECCV 2024

David Wan, Jaemin Cho, Elias Stengel-Eskin, Mohit Bansal

Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training

An Assessment of Reported Biases and Harms of Large Language Models

Analysis of how biases and harms are reported and understood in recent LLM papers - ICA 2024 (Top Paper Award)

Heesoo Jang, Jaemin Cho

Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation

a New Diagnostic Benchmark (LayoutBench) and a new Baseline model (IterInpaint) for Layout-Guided Image Generation - CVPR Workshop 2024 (Oral)

Jaemin Cho, Linjie Li, Zhengyuan Yang, Zhe Gan, Lijuan Wang, Mohit Bansal

Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation