1 to 10 of 29 Results
Oct 8, 2024
Huang, Ziqi; Wu, Tianxing; Jiang, Yuming; Chan, Kelvin C. K.; Liu, Ziwei, 2024, "Replication Data for: ReVersion: Diffusion-Based Relation Inversion from Images", https://doi.org/10.21979/N9/UWSAXU, DR-NTU (Data), V1
A replication of the ReVersion Benchmark, for the paper "ReVersion: Diffusion-Based Relation Inversion from Images". |
Oct 8, 2024
Xie, Binzhu; Zhang, Sicheng; Zhou, Zitang; Li, Bo; Zhang, Yuanhan; Hessel, Jack; Yang, Jingkang; Liu, Ziwei, 2024, "FunQA: Towards Surprising Video Comprehension", https://doi.org/10.21979/N9/SMR703, DR-NTU (Data), V1
Surprising videos, e.g., funny clips, creative performances, or visual illusions, attract significant attention. Enjoyment of these videos is not simply a response to visual stimuli; rather, it hinges on the human capacity to understand (and appreciate) commonsense violations dep... |
Oct 8, 2024
Yang, Jingkang; Dong, Yuhao; Liu, Shuai; Li, Bo; Wang, Ziyue; Jiang, Chencheng; Tan, Haoran; Kang, Jiamu; Zhang, Yuanhan; Zhou, Kaiyang; Liu, Ziwei, 2024, "Octopus: Embodied Vision-Language Programmer from Environmental Feedback", https://doi.org/10.21979/N9/9EIB8X, DR-NTU (Data), V1
Large vision-language models (VLMs) have achieved substantial progress in multimodal perception and reasoning. Furthermore, when seamlessly integrated into an embodied agent, it signifies a crucial stride towards the creation of autonomous and context-aware systems capable of for... |
Oct 7, 2024
Ma, Yubo; Zang, Yuhang; Chan, Liangyu; Chen, Meiqi; Jiao, Yizhu; Li, Xinze; Lu Xinyuan; Liu, Ziyu; Ma, Yan; Dong, Xiaoyi; Zhang, Pan; Pan, Liangming; Jiang, Yu-Gang; Wang, Jiaqi; Cao, Yixin; Sun, Aixin, 2024, "Replication Data for: MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations", https://doi.org/10.21979/N9/IMVWT4, DR-NTU (Data), V1
Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities... |
Oct 4, 2024
Yue, Zongsheng; Wang, Jianyi; Loy, Chen Change, 2024, "Efficient Diffusion Model for Image Restoration by Residual Shifting", https://doi.org/10.21979/N9/VYPJ0O, DR-NTU (Data), V1
While diffusion-based image restoration (IR) methods have achieved remarkable success, they are still limited by the low inference speed attributed to the necessity of executing hundreds or even thousands of sampling steps. Existing acceleration sampling techniques, though seekin... |
Oct 3, 2024
Guo, Zujin; Li, Wei; Loy, Chen Change, 2024, "Generalizable Implicit Motion Modeling for Video Frame Interpolation", https://doi.org/10.21979/N9/EDKWDC, DR-NTU (Data), V1
Motion modeling is critical in flow-based Video Frame Interpolation (VFI). Existing paradigms either consider linear combinations of bidirectional flows or directly predict bilateral flows for given timestamps without exploring favorable motion priors, thus lacking the capability... |
Oct 2, 2024
Hu, Runyi; Zhang, Jie; Xu, Ting; Li, Jiwei; Zhang, Tianwei, 2024, "Robust-Wide: Robust Watermarking against Instruction-driven Image Editing", https://doi.org/10.21979/N9/XVTPW9, DR-NTU (Data), V1
Instruction-driven image editing allows users to quickly edit an image according to text instructions in a forward pass. Nevertheless, malicious users can easily exploit this technique to create fake images, which could cause a crisis of trust and harm the rights of the original... |
Oct 2, 2024
Xu, Qianxiong; Liu, Xuanyi; Zhu, Lanyun; Lin, Guosheng; Long, Cheng; Li, Ziyue; Zhao, Rui, 2024, "Hybrid Mamba for Few-Shot Segmentation", https://doi.org/10.21979/N9/PHG7NV, DR-NTU (Data), V1
Many few-shot segmentation (FSS) methods use cross attention to fuse support foreground (FG) into query features, regardless of the quadratic complexity. A recent advance Mamba can also well capture intra-sequence dependencies, yet the complexity is only linear. Hence, we aim to... |
Oct 1, 2024
Liu, Tianqi; Wang, Guangcong; Hu, Shoukang; Shen, Liao; Ye, Xinyi; Zang, Yuhang; Cao, Zhiguo; Li, Wei; Liu, Ziwei, 2024, "MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo", https://doi.org/10.21979/N9/9LDWXG, DR-NTU (Data), V1
We present MVSGaussian, a new generalizable 3D Gaussian representation approach derived from Multi-View Stereo (MVS) that can efficiently reconstruct unseen scenes. Specifically, 1) we leverage MVS to encode geometry-aware Gaussian representations and decode them into Gaussian pa... |
Oct 1, 2024 - Chen Change LOY
Loy, Chen Change; Yang, Shuai, 2024, "VToonify", https://doi.org/10.21979/N9/7PGAOA, DR-NTU (Data), V4
Generating high-quality artistic portrait videos is an important and desirable task in computer graphics and vision. Although a series of successful portrait image toonification models built upon the powerful StyleGAN have been proposed, these image-oriented methods have obvious... |