Tags

Grounding
VidLanKD
Vision
Vokenization
Pretraining
VL-T5
Academic
开源
X-LXMERT