Tags

Image captioning
Adapter
VL Adapter
Multi-hop
Multi-modal
Distillation
Grounding
VidLanKD
Vision