Publications

M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding

arXiv preprint, 2024

Preprint Cite Project

DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback

arXiv preprint, 2024

Preprint Cite Code Project

EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents

In COLM, 2024

Preprint Cite Code Project

An Assessment of Reported Biases and Harms of Large Language Models

In ICA (Top Paper Award), 2024

PDF Cite

DOCCI: Descriptions of Connected and Contrasting Images

In ECCV, 2024

Preprint Cite Dataset Project

Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation

In CVPR Workshop (Oral), 2024

Preprint Cite Code Dataset Project

Davidsonian Scene Graph: Improving Reliability in Fine-Grained Evaluation for Text-to-Image Generation

In ICLR, 2023

Preprint Cite Code Project

Paxion: Patching Action Knowledge in Video-Language Foundation Models

In NeurIPS, 2023

Preprint Cite Code

Hierarchical Video-Moment Retrieval and Step-Captioning

In CVPR, 2023

Preprint Cite Code Project

TVLT: Textless Vision-Language Transformer

In NeurIPS, 2022

Preprint Cite Code

Fine-grained Image Captioning with CLIP Reward

In Findings of NAACL, 2022

Preprint Cite Code

MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding

In AAAI, 2021

Preprint Cite

Unifying Vision-and-Language Tasks via Text Generation

In ICML, 2021

Preprint Cite Code

X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers

In EMNLP, 2020

Preprint Cite Code Project

Mixture Content Selection for Diverse Sequence Generation

In EMNLP, 2019

Preprint Cite Code