Publications

(2024). EnvGen: Generating and Adapting Environments via LLMs for Training Embodied Agents.

Preprint Cite Code Project

(2024). SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data.

Preprint Cite Code Project

(2023). Davidsonian Scene Graph: Improving Reliability in Fine-Grained Evaluation for Text-to-Image Generation. In ICLR.

Preprint Cite Code Project

(2023). Self-Chained Image-Language Model for Video Localization and Question Answering. In NeurIPS.

Preprint Cite Code

(2023). Paxion: Patching Action Knowledge in Video-Language Foundation Models. In NeurIPS.

Preprint Cite Code

(2023). Diagnostic Benchmark and Iterative Inpainting for Layout-Guided Image Generation.

Preprint Cite Code Dataset Project

(2023). Hierarchical Video-Moment Retrieval and Step-Captioning. In CVPR.

Preprint Cite Code Project

(2022). TVLT: Textless Vision-Language Transformer. In NeurIPS.

Preprint Cite Code

(2022). Fine-grained Image Captioning with CLIP Reward. In Findings of NAACL.

Preprint Cite Code

(2021). MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding. In AAAI.

Preprint Cite

(2021). Unifying Vision-and-Language Tasks via Text Generation. In ICML.

Preprint Cite Code

(2020). X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers. In EMNLP.

Preprint Cite Code Project

(2019). Mixture Content Selection for Diverse Sequence Generation. In EMNLP.

Preprint Cite Code