S-Lab for Advanced Intelligence

Featured Dataverses

In order to use this feature you must have at least one published dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

1 to 10 of 83 Results

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding Oct 7, 2025 Zhang, Yuanhan; Chew, Yunice; Dong, Yuhao; Leo, Aria; Hu, Bo; Liu, Ziwei, 2025, "Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding", https://doi.org/10.21979/N9/KTBVSQ, DR-NTU (Data), V1 We introduce the Video Thinking Test (Video-TT), a benchmark designed to assess if video LLMs can interpret real-world videos as effectively as humans. Video-TT 1) differentiates between errors due to inadequate frame sampling and genuine gaps in understanding complex visual narr...
Weakly and Self-Supervised Class-Agnostic Motion Prediction for Autonomous Driving Sep 17, 2025 Li, Ruibo; Shi, Hanyu; Wang, Zhe; Lin, Guosheng, 2025, "Weakly and Self-Supervised Class-Agnostic Motion Prediction for Autonomous Driving", https://doi.org/10.21979/N9/PE8MLE, DR-NTU (Data), V1 Understanding motion in dynamic environments is critical for autonomous driving, thereby motivating research on class-agnostic motion prediction. In this work, we investigate weakly and self-supervised class-agnostic motion prediction from LiDAR point clouds. Outdoor scenes typic...
Trans-Adapter: A Plug-and-Play Framework for Transparent Image Inpainting Sep 11, 2025 Dai, Yuekun; Li, Haitian; Zhou, Shangchen; Loy, Chen Change, 2025, "Trans-Adapter: A Plug-and-Play Framework for Transparent Image Inpainting", https://doi.org/10.21979/N9/4NI0GT, DR-NTU (Data), V1 RGBA images, with the additional alpha channel, are crucial for any application that needs blending, masking, or transparency effects, making them more versatile than standard RGB images. Nevertheless, existing image inpainting methods are designed exclusively for RGB images. Con...
layerbench.zip Sep 11, 2025 - Trans-Adapter: A Plug-and-Play Framework for Transparent Image Inpainting ZIP Archive - 654.6 MB - MD5: 972cfdcaec94978b658a7296d3bc0dbb Benchmark of our paper.
Compositional Generative Model of Unbounded 4D Cities Sep 10, 2025 Xie, Haozhe; Chen, Zhaoxi; Hong, Fangzhou; Liu, Ziwei, 2025, "Compositional Generative Model of Unbounded 4D Cities", https://doi.org/10.21979/N9/CHQPCL, DR-NTU (Data), V1 3D scene generation has garnered growing attention in recent years and has made significant progress. Generating 4D cities is more challenging than 3D scenes due to the presence of structurally complex, visually diverse objects like buildings and vehicles, and heightened human se...
Enhanced Generative Structure Prior for Chinese Text Image Super-Resolution Sep 4, 2025 Li, Xiaoming; Zuo, Wangmeng; Loy, Chen Change, 2025, "Enhanced Generative Structure Prior for Chinese Text Image Super-Resolution", https://doi.org/10.21979/N9/DTZDDZ, DR-NTU (Data), V1 Faithful text image super-resolution (SR) is challenging because each character has a unique structure and usually exhibits diverse font styles and layouts. While existing methods primarily focus on English text, less attention has been paid to more complex scripts like Chinese....
F-LMM: Grounding Frozen Large Multimodal Models Jun 12, 2025 Wu, Size; Jin, Sheng; Zhang, Wenwei; Xu, Lumin; Liu, Wentao; Li, Wei; Loy, Chen Change, 2025, "F-LMM: Grounding Frozen Large Multimodal Models", https://doi.org/10.21979/N9/M0U5AV, DR-NTU (Data), V1 Endowing Large Multimodal Models (LMMs) with visual grounding capability can significantly enhance AIs’ understanding of the visual world and their interaction with humans. However, existing methods typically fine-tune the parameters of LMMs to learn additional segmentation token...
MOWA: Multiple-in-One Image Warping Model Jun 5, 2025 Liao, Kang; Yue, Zongsheng; Wu, Zhonghua; Loy, Chen Change, 2025, "MOWA: Multiple-in-One Image Warping Model", https://doi.org/10.21979/N9/ZPPMT8, DR-NTU (Data), V1 While recent image warping approaches achieved remarkable success on existing benchmarks, they still require training separate models for each specific task and cannot generalize well to different camera models or customized manipulations. To address diverse types of warping in p...
Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space Jun 3, 2025 Zhou, Yifan; Xiao, Zeqi; Yang, Shuai; Pan, Xingang, 2025, "Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space", https://doi.org/10.21979/N9/Y6AOQH, DR-NTU (Data), V1 Latent Diffusion Models (LDMs) are known to have an unstable generation process, where even small perturbations or shifts in the input noise can lead to significantly different outputs. This hinders their applicability in applications requiring consistent results. In this work, w...
DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting Jun 3, 2025 Shen, Liao; Liu, Tianqi; Sun, Huiqiang; Li, Jiaqi; Cao, Zhiguo; Li, Wei; Loy, Chen Change, 2025, "DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting", https://doi.org/10.21979/N9/JKJHNJ, DR-NTU (Data), V1 Recent advances in 3D Gaussian Splatting (3D-GS) have shown remarkable success in representing 3D scenes and generating high-quality, novel views in real-time. However, 3D-GS and its variants assume that input images are captured based on pinhole imaging and are fully in focus. T...

Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding

Oct 7, 2025

Zhang, Yuanhan; Chew, Yunice; Dong, Yuhao; Leo, Aria; Hu, Bo; Liu, Ziwei, 2025, "Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding", https://doi.org/10.21979/N9/KTBVSQ, DR-NTU (Data), V1

We introduce the Video Thinking Test (Video-TT), a benchmark designed to assess if video LLMs can interpret real-world videos as effectively as humans. Video-TT 1) differentiates between errors due to inadequate frame sampling and genuine gaps in understanding complex visual narr...

Weakly and Self-Supervised Class-Agnostic Motion Prediction for Autonomous Driving

Sep 17, 2025

Li, Ruibo; Shi, Hanyu; Wang, Zhe; Lin, Guosheng, 2025, "Weakly and Self-Supervised Class-Agnostic Motion Prediction for Autonomous Driving", https://doi.org/10.21979/N9/PE8MLE, DR-NTU (Data), V1

Understanding motion in dynamic environments is critical for autonomous driving, thereby motivating research on class-agnostic motion prediction. In this work, we investigate weakly and self-supervised class-agnostic motion prediction from LiDAR point clouds. Outdoor scenes typic...

Trans-Adapter: A Plug-and-Play Framework for Transparent Image Inpainting

Sep 11, 2025

Dai, Yuekun; Li, Haitian; Zhou, Shangchen; Loy, Chen Change, 2025, "Trans-Adapter: A Plug-and-Play Framework for Transparent Image Inpainting", https://doi.org/10.21979/N9/4NI0GT, DR-NTU (Data), V1

RGBA images, with the additional alpha channel, are crucial for any application that needs blending, masking, or transparency effects, making them more versatile than standard RGB images. Nevertheless, existing image inpainting methods are designed exclusively for RGB images. Con...

layerbench.zip

Sep 11, 2025 - Trans-Adapter: A Plug-and-Play Framework for Transparent Image Inpainting

ZIP Archive - 654.6 MB -

Benchmark of our paper.

Compositional Generative Model of Unbounded 4D Cities

Sep 10, 2025

Xie, Haozhe; Chen, Zhaoxi; Hong, Fangzhou; Liu, Ziwei, 2025, "Compositional Generative Model of Unbounded 4D Cities", https://doi.org/10.21979/N9/CHQPCL, DR-NTU (Data), V1

3D scene generation has garnered growing attention in recent years and has made significant progress. Generating 4D cities is more challenging than 3D scenes due to the presence of structurally complex, visually diverse objects like buildings and vehicles, and heightened human se...

Enhanced Generative Structure Prior for Chinese Text Image Super-Resolution

Sep 4, 2025

Li, Xiaoming; Zuo, Wangmeng; Loy, Chen Change, 2025, "Enhanced Generative Structure Prior for Chinese Text Image Super-Resolution", https://doi.org/10.21979/N9/DTZDDZ, DR-NTU (Data), V1

Faithful text image super-resolution (SR) is challenging because each character has a unique structure and usually exhibits diverse font styles and layouts. While existing methods primarily focus on English text, less attention has been paid to more complex scripts like Chinese....

F-LMM: Grounding Frozen Large Multimodal Models

Jun 12, 2025

Wu, Size; Jin, Sheng; Zhang, Wenwei; Xu, Lumin; Liu, Wentao; Li, Wei; Loy, Chen Change, 2025, "F-LMM: Grounding Frozen Large Multimodal Models", https://doi.org/10.21979/N9/M0U5AV, DR-NTU (Data), V1

Endowing Large Multimodal Models (LMMs) with visual grounding capability can significantly enhance AIs’ understanding of the visual world and their interaction with humans. However, existing methods typically fine-tune the parameters of LMMs to learn additional segmentation token...

MOWA: Multiple-in-One Image Warping Model

Jun 5, 2025

Liao, Kang; Yue, Zongsheng; Wu, Zhonghua; Loy, Chen Change, 2025, "MOWA: Multiple-in-One Image Warping Model", https://doi.org/10.21979/N9/ZPPMT8, DR-NTU (Data), V1

While recent image warping approaches achieved remarkable success on existing benchmarks, they still require training separate models for each specific task and cannot generalize well to different camera models or customized manipulations. To address diverse types of warping in p...

Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space

Jun 3, 2025

Zhou, Yifan; Xiao, Zeqi; Yang, Shuai; Pan, Xingang, 2025, "Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space", https://doi.org/10.21979/N9/Y6AOQH, DR-NTU (Data), V1

Latent Diffusion Models (LDMs) are known to have an unstable generation process, where even small perturbations or shifts in the input noise can lead to significantly different outputs. This hinders their applicability in applications requiring consistent results. In this work, w...

DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting

Jun 3, 2025

Shen, Liao; Liu, Tianqi; Sun, Huiqiang; Li, Jiaqi; Cao, Zhiguo; Li, Wei; Loy, Chen Change, 2025, "DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting", https://doi.org/10.21979/N9/JKJHNJ, DR-NTU (Data), V1

Recent advances in 3D Gaussian Splatting (3D-GS) have shown remarkable success in representing 3D scenes and generating high-quality, novel views in real-time. However, 3D-GS and its variants assume that input images are captured based on pinhole imaging and are fully in focus. T...

Add Data

Share Dataverse

Link Dataverse

Reset Modifications