Jaemin Cho
Publications
Experience
CV
__Jaemin Cho__
Latest
Visual Programming for Text-to-Image Generation and Evaluation
Paxion: Patching Action Knowledge in Video-Language Foundation Models
Self-Chained Image-Language Model for Video Localization and Question Answering
Hierarchical Video-Moment Retrieval and Step-Captioning
TVLT: Textless Vision-Language Transformer
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Fine-grained Image Captioning with CLIP Reward
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generative Transformers
MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding
VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer
Unifying Vision-and-Language Tasks via Text Generation
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
Mixture Content Selection for Diverse Sequence Generation
A Hierarchical Latent Structure for Variational Conversation Modeling
Cite
×