VL-T5

Unifying Vision-and-Language Tasks via Text Generation

Tackle different V&L tasks via text generation with a single unified architecture - *[ICML 2021](https://icml.cc/Conferences/2021)*