Video-based grounding can improve diverse NLU tasks - *[NeurIPS 2021](https://nips.cc/Conferences/2021)*
Tackle different V&L tasks via text generation with a single unified architecture - *[ICML 2021](https://icml.cc/Conferences/2021)*
Generate image from text by predicting masked patches with multi-modal transformers - *[EMNLP 2020](https://2020.emnlp.org/)*
Separate Diversification from Generation to improve both diversity and accuracy in sequence generation - *[EMNLP 2019](https://www.emnlp-ijcnlp2019.org/)*
Propose a hierarchical VAE model and utterance drop regularization to mitigate posterior collapse problem - *[NAACL 2018](http://naacl.org/naacl-hlt-2018/)* (Oral)