Publications

CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting

arXiv preprint, 2025

Preprint Cite Code

Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems

arXiv preprint, 2025

Preprint Cite Dataset Project

Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization

arXiv preprint, 2025

Preprint Cite Code Project

DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback

In ICLR (Spotlight), 2025

Preprint Cite Code Project

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

arXiv preprint, 2024

Preprint Cite Code Project

EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents

In COLM, 2024

Preprint Cite Code Project

DOCCI: Descriptions of Connected and Contrasting Images

In ECCV, 2024

Preprint Cite Dataset Project

An Assessment of Reported Biases and Harms of Large Language Models

In ICA (Top Paper Award), 2024

PDF Cite

Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation

In CVPR Workshop (Oral), 2024

Preprint Cite Code Dataset Project

Davidsonian Scene Graph: Improving Reliability in Fine-Grained Evaluation for Text-to-Image Generation

In ICLR, 2024

Preprint Cite Code Project

Paxion: Patching Action Knowledge in Video-Language Foundation Models

In NeurIPS, 2023

Preprint Cite Code

Hierarchical Video-Moment Retrieval and Step-Captioning

In CVPR, 2023

Preprint Cite Code Project

TVLT: Textless Vision-Language Transformer

In NeurIPS (Oral), 2022

Preprint Cite Code

Fine-grained Image Captioning with CLIP Reward

In Findings of NAACL, 2022

Preprint Cite Code

MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding

In AAAI, 2021

Preprint Cite

Unifying Vision-and-Language Tasks via Text Generation

In ICML, 2021

Preprint Cite Code

X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers

In EMNLP, 2020

Preprint Cite Code Project

Mixture Content Selection for Diverse Sequence Generation

In EMNLP, 2019

Preprint Cite Code