Yuyang Zhao

I am a Research Scientist at NVIDIA Research, working with Prof. Song Han. I obtained my Ph.D. degree from National University of Singapore, where I was advised by A/P Gim Hee Lee. I received my B.E. degree from Tianjin University.

My research interests lie in AIGC and generalizable computer vision systems. Currently, I am working on image, video and 3D generation.

News

[May 2025] SANA 1.5 is accepted to ICML 2025!
[January 2025] GenXD, SOLE and ComPC are accepted to ICLR 2025! Congrats to all the co-authors!
[September 2024] I was awared Outstanding Self-financed Students Abroad!
[September 2024] X-Ray is accepted to NeurIPS 2024 as Spotlight!
[July 2024] TreeSBA is accepted to ECCV 2024!
[November 2023] Animate124 is released!
[September 2023] Two papers about visual domain generalization and parameter efficient fine-tuning are accepted to IJCV!
[May 2023] Make-A-Protagonist is released! This is my first step to AIGC.
[January 2023] I received the Research Achievement Award by NUS!
[September 2022] Our AdvStyle is accepted to NeurIPS 2022!
[July 2022] I received the Outstanding Reviewer Award in ICML 2022 (Top 10%) !
[July 2022] One paper about domain generalized semantic segmentation is accepted to ECCV 2022!
[May 2022] One paper about open compound domain adaptation is accepted to IEEE TCSVT!
[March 2022] One paper about novel class discovery is accepted to CVPR 2022!
[November 2021] One paper about optical flow estimation is accepted to Neurocomputing.
[September 2021] One paper about optical flow estimation is accepted to Signal Processing: Image Communication.
[March 2021] One paper about domain generalized person re-identification is accepted to CVPR 2021!

Work Experience

Research Scientist

NVIDIA Research | May 2025 - Present

Image Generation (SANA 1.5 and SANA Sprint)
Video Generation

Research Intern

GenAI, Microsoft | April 2024 - August 2024

3D and 4D Generation

Featured Works

	GenXD: Generating Any 3D and 4D Scenes Yuyang Zhao, Chung-Ching Lin, Kevin Lin, Zhiwen Yan, Linjie Li, Zhengyuan Yang, Jianfeng Wang, Gim Hee Lee, Lijuan Wang ICLR, 2025 Project Page / PDF / Code / Data A joint framework for general 3D and 4D generation, supporting both object-level and scene-level generation with one or three condition views.
	SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation Junsong Chen, Shuchen Xue, Yuyang Zhao†, Jincheng Yu†, Sayak Paul, Junyu Chen, Han Cai, Enze Xie, Song Han (* Equal contribution, † Core contribution) Arxiv, 2025 Project Page / PDF / Code / One-step and few-step high quality image generation.
	SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer Enze Xie, Junsong Chen, Yuyang Zhao†, Jincheng Yu†, Ligeng Zhu†, Yujun Lin, Zhekai Zhang, Muyang Li, Junyu Chen, Han Cai, Bingchen Liu, Daquan Zhou, Song Han (* Equal contribution, † Core contribution) ICML, 2025 Project Page / PDF / Code / New SoTA in GenEval with efficient scaling.
	Segment Any 3D Object with Language Seungjun Lee, Yuyang Zhao*, Gim Hee Lee ( Equal contribution) ICLR, 2025 Project Page / PDF / Code SOLE is a highly generalizable open-vocabulary instance segmentor and can segment corresponding instances with various language instructions.
	Animate124: Animating One Image to 4D Dynamic Scene Yuyang Zhao, Zhiwen Yan, Enze Xie, Lanqing Hong, Zhenguo Li, Gim Hee Lee Arxiv Project Page / PDF / Code / Video The first work to animate a single in-the-wild image into 3D video through textual motion descriptions.
	X-Ray: A Sequential 3D Representation for Generation Tao Hu, Wenhang Ge, Yuyang Zhao, Gim Hee Lee NeurIPS, 2024 Spotlight Project Page / PDF / Code / Demo / Data
	Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts Yuyang Zhao, Enze Xie, Lanqing Hong, Zhenguo Li, Gim Hee Lee Arxiv Project Page / PDF / Code / Demo The first framework for generic video editing with both visual and textual clues. Make-A-Protagonist can achieve background editing, protagonist editing, and text-to-video editing with protagonist.
	Style-Hallucinated Dual Consistency Learning: A Unified Framework for Visual Domain Generalization Yuyang Zhao, Zhun Zhong, Na Zhao, Nicu Sebe, Gim Hee Lee IJCV, 2023 PDF / Code / 知乎 Extension of our ECCV 2022 paper (SHADE). This paper applies SHADE to visual domain generalization tasks, including semantic segmentation with Transformer backbone, image classification, and object detection.
	Adversarial Style Augmentation for Domain Generalized Urban-Scene Segmentation Zhun Zhong, Yuyang Zhao*, Gim Hee Lee, Nicu Sebe ( Equal contribution) NeurIPS, 2022 PDF / Code AdvStyle adversarially changes the channel-wise mean and standard deviation to diversify source samples.
	Style-Hallucinated Dual Consistency Learning for Domain Generalized Semantic Segmentation Yuyang Zhao, Zhun Zhong, Na Zhao, Nicu Sebe, Gim Hee Lee ECCV, 2022 PDF / Code / Poster / Video / 知乎 We introduce a dual consistency learning framework for domain generalized semantic segmentation, and propose a style hallucination module to generate pair-wise stylized samples.
	Novel Class Discovery in Semantic Segmentation Yuyang Zhao, Zhun Zhong, Nicu Sebe, Gim Hee Lee CVPR, 2022 Project Page / PDF / Code The first work focuses on novel class discovery in semantic segmentation. This work addresses the co-occurrence of base, novel and background classes.
	Learning to Generalize Unseen Domains via Memory-based Multi-Source Meta-Learning for Person Re-Identification Yuyang Zhao, Zhun Zhong, Fengxiang Yang, Zhiming Luo, Shaozi Li, Nicu Sebe (* Equal contribution) CVPR, 2021 PDF / Code

Other Publications / Preprints

	TreeSBA: Tree-Transformer for Self-Supervised Sequential Brick Assembly Mengqi Guo, Chen Li, Yuyang Zhao, Gim Hee Lee ECCV, 2024 Project Page / PDF / Code
	SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels Henry Hengyuan Zhao, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou IJCV, 2023 PDF / Code
	Two Heads Are Better Than One: Improving Fake News Video Detection by Correlating with Neighbors Peng Qi, Yuyang Zhao, Yufeng Shen, Wei Ji, Juan Cao, Tat-Seng Chua ACL Findings, 2023 PDF / Code
	Source-Free Open Compound Domain Adaptation in Semantic Segmentation Yuyang Zhao, Zhun Zhong, Zhiming Luo, Gim Hee Lee, Nicu Sebe (* Equal contribution) IEEE Transactions on Circuits and Systems for Video Technology, 2022 PDF / Code
	Synthetic-to-Real Domain Generalized Semantic Segmentation for 3D Indoor Point Clouds Yuyang Zhao, Na Zhao, Gim Hee Lee BMVC, 2024 PDF The first work on domain generalized semantic segmentation in 3D indoor scenes.
	ComPC: Completing a 3D Point Cloud with 2D Diffusion Priors Tianxin Huang, Zhiwen Yan, Yuyang Zhao, Gim Hee Lee ICLR, 2025 PDF / Code
	Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm Henry Hengyuan Zhao, Hao Luo, Yuyang Zhao, Pichao Wang, Fan Wang, Mike Zheng Shou Arxiv, 2023 PDF / Code

Professional Service

Program Comittee / Conference Reviewer: CVPR, ICCV, ECCV, ICML, NeurIPS, ICLR
Journal Reviewer: IJCV, IEEE TPAMI, IEEE TMM, IEEE TCSVT

Awards

Outstanding Self-financed Students Abroad (45 of all Chinese PhD students in Singapore), Singapore, 2024
Research Achievement Award, National University of Singapore, 2023
Outstanding Reviewer Award, ICML, 2022
Research Scholarship, National University of Singapore, 2021

Stolen from Jon Barron