Jaemin Cho
Publications
Experience
CV
Pretraining
Unifying Vision-and-Language Tasks via Text Generation
Tackle different V&L tasks via text generation with a single unified architecture - *[ICML 2021](https://icml.cc/Conferences/2021)*
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
Generate image from text by predicting masked patches with multi-modal transformers - *[EMNLP 2020](https://2020.emnlp.org/)*
Cite
×