I am a Research Scientist at NVIDIA Research, working with Prof. Song Han. I obtained my Ph.D. degree from National University of Singapore, where I was advised by A/P Gim Hee Lee. I obtained my B.E. degree from Tianjin University.
We are working on efficient and high-quality AIGC models, including image, video and 3D generation.
I am always looking to collaborate with motivated students and researchers passionate about efficient, high-quality generative modeling. Our current research explores the frontiers of video pre-training, post-training, video editing, and world models. If you are interested in pushing the boundaries of what is possible in video generation, please reach out!
[October 2025] SANA-Video (Oral) and LongLive are accepted to ICLR 2026! Checkout our long and efficient video generation models!
[September 2025] GLIMPSE is accepted to EMNLP 2025 as Oral!
[August 2025] I will be an Area Chair for ICLR 2026!
[June 2025] SANA-Sprint is accepted to ICCV 2025 as Highlight! See you in Hawaii!
[May 2025] SANA 1.5 is accepted to ICML 2025!
[January 2025] GenXD, SOLE and ComPC are accepted to ICLR 2025! Congrats to all the co-authors!
[September 2024] I was awared Outstanding Self-financed Students Abroad!
[September 2024] X-Ray is accepted to NeurIPS 2024 as Spotlight!
[July 2024] TreeSBA is accepted to ECCV 2024!
[November 2023] Animate124 is released!
[September 2023] Two papers about visual domain generalization and parameter efficient fine-tuning are accepted to IJCV!
[May 2023] Make-A-Protagonist is released! This is my first step to AIGC.
[January 2023] I received the Research Achievement Award by NUS!
[September 2022] Our AdvStyle is accepted to NeurIPS 2022!
SANA-Video is an efficient and high-quality video generation model. The LongSANA variant combines SANA-Video with LongLive to generate long and high-quality videos with 27 FPS generation speed.
The first framework for generic video editing with both visual and textual clues. Make-A-Protagonist can achieve background editing, protagonist editing, and text-to-video editing with protagonist.
Extension of our ECCV 2022 paper (SHADE).
This paper applies SHADE to visual domain generalization tasks, including semantic segmentation with Transformer backbone, image classification, and object detection.
We introduce a dual consistency learning framework for domain generalized semantic segmentation, and propose a style hallucination module to generate pair-wise stylized samples.