S-Lab for Advanced Intelligence

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

41 to 50 of 84 Results

Video Diffusion Models are Training-free Motion Interpreter and Controller Nov 7, 2024 Xiao, Zeqi; Zhou, Yifan; Yang, Shuai; Pan, Xingang, 2024, "Video Diffusion Models are Training-free Motion Interpreter and Controller", https://doi.org/10.21979/N9/HQM313, DR-NTU (Data), V1 Video generation primarily aims to model authentic and customized motion across frames, making understanding and controlling the motion a crucial topic. Most diffusion-based studies on video motion focus on motion customization with training-based paradigms, which, however, deman...
MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders Oct 23, 2024 Jiang, Xueying; Jin, Sheng; Zhang, Xiaoqin; Shao, Ling; Lu, Shijian, 2024, "MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders", https://doi.org/10.21979/N9/5ILJOM, DR-NTU (Data), V1 Monocular 3D object detection aims for precise 3D localization and identification of objects from a single-view image. Despite its recent progress, it often struggles while handling pervasive object occlusions that tend to complicate and degrade the prediction of object dimension...
Replication Data for: ReVersion: Diffusion-Based Relation Inversion from Images Oct 8, 2024 Huang, Ziqi; Wu, Tianxing; Jiang, Yuming; Chan, Kelvin C. K.; Liu, Ziwei, 2024, "Replication Data for: ReVersion: Diffusion-Based Relation Inversion from Images", https://doi.org/10.21979/N9/UWSAXU, DR-NTU (Data), V1 A replication of the ReVersion Benchmark, for the paper "ReVersion: Diffusion-Based Relation Inversion from Images".
FunQA: Towards Surprising Video Comprehension Oct 8, 2024 Xie, Binzhu; Zhang, Sicheng; Zhou, Zitang; Li, Bo; Zhang, Yuanhan; Hessel, Jack; Yang, Jingkang; Liu, Ziwei, 2024, "FunQA: Towards Surprising Video Comprehension", https://doi.org/10.21979/N9/SMR703, DR-NTU (Data), V1 Surprising videos, e.g., funny clips, creative performances, or visual illusions, attract significant attention. Enjoyment of these videos is not simply a response to visual stimuli; rather, it hinges on the human capacity to understand (and appreciate) commonsense violations dep...
Octopus: Embodied Vision-Language Programmer from Environmental Feedback Oct 8, 2024 Yang, Jingkang; Dong, Yuhao; Liu, Shuai; Li, Bo; Wang, Ziyue; Jiang, Chencheng; Tan, Haoran; Kang, Jiamu; Zhang, Yuanhan; Zhou, Kaiyang; Liu, Ziwei, 2024, "Octopus: Embodied Vision-Language Programmer from Environmental Feedback", https://doi.org/10.21979/N9/9EIB8X, DR-NTU (Data), V1 Large vision-language models (VLMs) have achieved substantial progress in multimodal perception and reasoning. Furthermore, when seamlessly integrated into an embodied agent, it signifies a crucial stride towards the creation of autonomous and context-aware systems capable of for...
OctoGibsonDataset.zip Oct 8, 2024 - Octopus: Embodied Vision-Language Programmer from Environmental Feedback ZIP Archive - 6.3 GB - MD5: df5972717c2859d34b4fadd702476a70
Replication Data for: MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations Oct 7, 2024 Ma, Yubo; Zang, Yuhang; Chan, Liangyu; Chen, Meiqi; Jiao, Yizhu; Li, Xinze; Lu Xinyuan; Liu, Ziyu; Ma, Yan; Dong, Xiaoyi; Zhang, Pan; Pan, Liangming; Jiang, Yu-Gang; Wang, Jiaqi; Cao, Yixin; Sun, Aixin, 2024, "Replication Data for: MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations", https://doi.org/10.21979/N9/IMVWT4, DR-NTU (Data), V1 Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities...
Efficient Diffusion Model for Image Restoration by Residual Shifting Oct 4, 2024 Yue, Zongsheng; Wang, Jianyi; Loy, Chen Change, 2024, "Efficient Diffusion Model for Image Restoration by Residual Shifting", https://doi.org/10.21979/N9/VYPJ0O, DR-NTU (Data), V1 While diffusion-based image restoration (IR) methods have achieved remarkable success, they are still limited by the low inference speed attributed to the necessity of executing hundreds or even thousands of sampling steps. Existing acceleration sampling techniques, though seekin...
Generalizable Implicit Motion Modeling for Video Frame Interpolation Oct 3, 2024 Guo, Zujin; Li, Wei; Loy, Chen Change, 2024, "Generalizable Implicit Motion Modeling for Video Frame Interpolation", https://doi.org/10.21979/N9/EDKWDC, DR-NTU (Data), V1 Motion modeling is critical in flow-based Video Frame Interpolation (VFI). Existing paradigms either consider linear combinations of bidirectional flows or directly predict bilateral flows for given timestamps without exploring favorable motion priors, thus lacking the capability...
Robust-Wide: Robust Watermarking against Instruction-driven Image Editing Oct 2, 2024 Hu, Runyi; Zhang, Jie; Xu, Ting; Li, Jiwei; Zhang, Tianwei, 2024, "Robust-Wide: Robust Watermarking against Instruction-driven Image Editing", https://doi.org/10.21979/N9/XVTPW9, DR-NTU (Data), V1 Instruction-driven image editing allows users to quickly edit an image according to text instructions in a forward pass. Nevertheless, malicious users can easily exploit this technique to create fake images, which could cause a crisis of trust and harm the rights of the original...

Video Diffusion Models are Training-free Motion Interpreter and Controller

Nov 7, 2024

Xiao, Zeqi; Zhou, Yifan; Yang, Shuai; Pan, Xingang, 2024, "Video Diffusion Models are Training-free Motion Interpreter and Controller", https://doi.org/10.21979/N9/HQM313, DR-NTU (Data), V1

Video generation primarily aims to model authentic and customized motion across frames, making understanding and controlling the motion a crucial topic. Most diffusion-based studies on video motion focus on motion customization with training-based paradigms, which, however, deman...

MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders

Oct 23, 2024

Jiang, Xueying; Jin, Sheng; Zhang, Xiaoqin; Shao, Ling; Lu, Shijian, 2024, "MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders", https://doi.org/10.21979/N9/5ILJOM, DR-NTU (Data), V1

Monocular 3D object detection aims for precise 3D localization and identification of objects from a single-view image. Despite its recent progress, it often struggles while handling pervasive object occlusions that tend to complicate and degrade the prediction of object dimension...

Replication Data for: ReVersion: Diffusion-Based Relation Inversion from Images

Oct 8, 2024

Huang, Ziqi; Wu, Tianxing; Jiang, Yuming; Chan, Kelvin C. K.; Liu, Ziwei, 2024, "Replication Data for: ReVersion: Diffusion-Based Relation Inversion from Images", https://doi.org/10.21979/N9/UWSAXU, DR-NTU (Data), V1

A replication of the ReVersion Benchmark, for the paper "ReVersion: Diffusion-Based Relation Inversion from Images".

FunQA: Towards Surprising Video Comprehension

Oct 8, 2024

Xie, Binzhu; Zhang, Sicheng; Zhou, Zitang; Li, Bo; Zhang, Yuanhan; Hessel, Jack; Yang, Jingkang; Liu, Ziwei, 2024, "FunQA: Towards Surprising Video Comprehension", https://doi.org/10.21979/N9/SMR703, DR-NTU (Data), V1

Surprising videos, e.g., funny clips, creative performances, or visual illusions, attract significant attention. Enjoyment of these videos is not simply a response to visual stimuli; rather, it hinges on the human capacity to understand (and appreciate) commonsense violations dep...

Octopus: Embodied Vision-Language Programmer from Environmental Feedback

Oct 8, 2024

Yang, Jingkang; Dong, Yuhao; Liu, Shuai; Li, Bo; Wang, Ziyue; Jiang, Chencheng; Tan, Haoran; Kang, Jiamu; Zhang, Yuanhan; Zhou, Kaiyang; Liu, Ziwei, 2024, "Octopus: Embodied Vision-Language Programmer from Environmental Feedback", https://doi.org/10.21979/N9/9EIB8X, DR-NTU (Data), V1

Large vision-language models (VLMs) have achieved substantial progress in multimodal perception and reasoning. Furthermore, when seamlessly integrated into an embodied agent, it signifies a crucial stride towards the creation of autonomous and context-aware systems capable of for...

OctoGibsonDataset.zip

Oct 8, 2024 - Octopus: Embodied Vision-Language Programmer from Environmental Feedback

ZIP Archive - 6.3 GB -

Replication Data for: MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

Oct 7, 2024

Ma, Yubo; Zang, Yuhang; Chan, Liangyu; Chen, Meiqi; Jiao, Yizhu; Li, Xinze; Lu Xinyuan; Liu, Ziyu; Ma, Yan; Dong, Xiaoyi; Zhang, Pan; Pan, Liangming; Jiang, Yu-Gang; Wang, Jiaqi; Cao, Yixin; Sun, Aixin, 2024, "Replication Data for: MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations", https://doi.org/10.21979/N9/IMVWT4, DR-NTU (Data), V1

Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities...

Efficient Diffusion Model for Image Restoration by Residual Shifting

Oct 4, 2024

Yue, Zongsheng; Wang, Jianyi; Loy, Chen Change, 2024, "Efficient Diffusion Model for Image Restoration by Residual Shifting", https://doi.org/10.21979/N9/VYPJ0O, DR-NTU (Data), V1

While diffusion-based image restoration (IR) methods have achieved remarkable success, they are still limited by the low inference speed attributed to the necessity of executing hundreds or even thousands of sampling steps. Existing acceleration sampling techniques, though seekin...

Generalizable Implicit Motion Modeling for Video Frame Interpolation

Oct 3, 2024

Guo, Zujin; Li, Wei; Loy, Chen Change, 2024, "Generalizable Implicit Motion Modeling for Video Frame Interpolation", https://doi.org/10.21979/N9/EDKWDC, DR-NTU (Data), V1

Motion modeling is critical in flow-based Video Frame Interpolation (VFI). Existing paradigms either consider linear combinations of bidirectional flows or directly predict bilateral flows for given timestamps without exploring favorable motion priors, thus lacking the capability...

Robust-Wide: Robust Watermarking against Instruction-driven Image Editing

Oct 2, 2024

Hu, Runyi; Zhang, Jie; Xu, Ting; Li, Jiwei; Zhang, Tianwei, 2024, "Robust-Wide: Robust Watermarking against Instruction-driven Image Editing", https://doi.org/10.21979/N9/XVTPW9, DR-NTU (Data), V1

Instruction-driven image editing allows users to quickly edit an image according to text instructions in a forward pass. Nevertheless, malicious users can easily exploit this technique to create fake images, which could cause a crisis of trust and harm the rights of the original...

Add Data

Share Dataverse

Link Dataverse

Reset Modifications