Efficient VL modeling with Perceiver-based iterative cross-attentions - *[WACV 2023](https://nips.cc/Conferences/2021)*
Adapter-based Parameter-Efficient Training for V&L tasks - *[CVPR 2022](https://cvpr2022.thecvf.com)*
Probing the Reasoning Skills and Social Biases of Text-to-Image Models
Video-based grounding can improve diverse NLU tasks - *[NeurIPS 2021](https://nips.cc/Conferences/2021)*
Tackle different V&L tasks via text generation with a single unified architecture - *[ICML 2021](https://icml.cc/Conferences/2021)*
Generate image from text by predicting masked patches with multi-modal transformers - *[EMNLP 2020](https://2020.emnlp.org/)*