Video Diffusion Models are Training-free Motion Interpreter and Controller (doi:10.21979/N9/HQM313)

View:

Part 1: Document Description
Part 2: Study Description
Entire Codebook

(external link) (external link)

Document Description

Citation

Title:

Video Diffusion Models are Training-free Motion Interpreter and Controller

Identification Number:

doi:10.21979/N9/HQM313

Distributor:

DR-NTU (Data)

Date of Distribution:

2024-11-07

Version:

1

Bibliographic Citation:

Xiao, Zeqi; Zhou, Yifan; Yang, Shuai; Pan, Xingang, 2024, "Video Diffusion Models are Training-free Motion Interpreter and Controller", https://doi.org/10.21979/N9/HQM313, DR-NTU (Data), V1

Study Description

Citation

Title:

Video Diffusion Models are Training-free Motion Interpreter and Controller

Identification Number:

doi:10.21979/N9/HQM313

Authoring Entity:

Xiao, Zeqi (Nanyang Technological University)

Zhou, Yifan (Nanyang Technological University)

Yang, Shuai (Peking University)

Pan, Xingang (Nanyang Technological University)

Software used in Production:

Nil

Distributor:

DR-NTU (Data)

Access Authority:

Xiao, Zeqi

Depositor:

Xiao Zeqi

Date of Deposit:

2024-10-11

Holdings Information:

https://doi.org/10.21979/N9/HQM313

Study Scope

Keywords:

Computer and Information Science, Video diffusion model

Abstract:

Video generation primarily aims to model authentic and customized motion across frames, making understanding and controlling the motion a crucial topic. Most diffusion-based studies on video motion focus on motion customization with training-based paradigms, which, however, demands substantial training resources and necessitates retraining for diverse models. Crucially, these approaches do not explore how video diffusion models encode cross-frame motion information in their features, lacking interpretability and transparency in their effectiveness. To answer this question, this paper introduces a novel perspective to understand, localize, and manipulate motion-aware features in video diffusion models. Through analysis using Principal Component Analysis (PCA), our work discloses that robust motion-aware feature already exists in video diffusion models. We present a new MOtion FeaTure (MOFT) by eliminating content correlation information and filtering motion channels. MOFT provides a distinct set of benefits, including the ability to encode comprehensive motion information with clear interpretability, extraction without the need for training, and generalizability across diverse architectures. Leveraging MOFT, we propose a novel training-free video motion control framework. Our method demonstrates competitive performance in generating natural and faithful motion, providing architecture-agnostic insights and applicability in a variety of downstream tasks.

Kind of Data:

Codes

Methodology and Processing

Sources Statement

Data Access

Other Study Description Materials

Related Studies

Paper: <a href="https://xizaoqu.github.io/moft/">Link</a>

Code: <a href="https://github.com/xizaoqu/TrajectoryAttention">Link</a>

Related Publications

Citation

Title:

Xiao, Z., Zhou, Y., Yang, S., & Pan, X. (2024, December). Video diffusion models are training-free motion interpreter and controller. In Proceedings of the 38th International Conference on Neural Information Processing Systems (pp. 76115-76138).

Identification Number:

10.5555/3737916.3740339

Bibliographic Citation:

Xiao, Z., Zhou, Y., Yang, S., & Pan, X. (2024, December). Video diffusion models are training-free motion interpreter and controller. In Proceedings of the 38th International Conference on Neural Information Processing Systems (pp. 76115-76138).

Citation

Title:

Xiao, Z., Zhou, Y., Yang, S., & Pan, X. (2024). Video diffusion models are training-free motion interpreter and controller. Advances in Neural Information Processing Systems, 37, 76115-76138.

Identification Number:

10356/201828

Bibliographic Citation:

Xiao, Z., Zhou, Y., Yang, S., & Pan, X. (2024). Video diffusion models are training-free motion interpreter and controller. Advances in Neural Information Processing Systems, 37, 76115-76138.