About Me

I am a PhD student in Computer Science and Engineering at The Chinese University of Hong Kong, under the supervision of Dr. Yu Cheng. My current research focuses on Multimodal Large Language Models and Game Video Generation.

I received my Master of Science in Computer Science from National University of Singapore in 2024, where I was supervised by Dr. Bryan Hooi. I obtained my Bachelor’s degree from Shanghai Jiao Tong University in 2021, under the supervision of Dr. Junchi Yan.

Latest News

Our work CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling is accepted by ENMLP2025!
Our recent work on Unified Vision Language Model is on arxiv now! Are Unified Vision-Language Models Necessary: Generalization Across Understanding and Generation. In this work we systematically study the generalization across understanding and generation in unified VLMs on a synthetic dataset. We validate the necessity of unification of generation and understanding: these two tasks can benefit each other!
The follow-up work of Learning the Unlearned is on arxiv now! CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling. In thie work we propose a new upcycling method for CLIP based on MCL, which is simple and effective.
Our work SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information is accepted by EMNLP!
Our work Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning is accepted by ECCV2024! The follow-up work is on the way!

Jihai Zhang

Latest News