1 to 10 of 223 Results
Oct 7, 2025 - S-Lab for Advanced Intelligence
Zhang, Yuanhan; Chew, Yunice; Dong, Yuhao; Leo, Aria; Hu, Bo; Liu, Ziwei, 2025, "Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding", https://doi.org/10.21979/N9/KTBVSQ, DR-NTU (Data), V1
We introduce the Video Thinking Test (Video-TT), a benchmark designed to assess if video LLMs can interpret real-world videos as effectively as humans. Video-TT 1) differentiates between errors due to inadequate frame sampling and genuine gaps in understanding complex visual narr... |
Sep 17, 2025 - S-Lab for Advanced Intelligence
Li, Ruibo; Shi, Hanyu; Wang, Zhe; Lin, Guosheng, 2025, "Weakly and Self-Supervised Class-Agnostic Motion Prediction for Autonomous Driving", https://doi.org/10.21979/N9/PE8MLE, DR-NTU (Data), V1
Understanding motion in dynamic environments is critical for autonomous driving, thereby motivating research on class-agnostic motion prediction. In this work, we investigate weakly and self-supervised class-agnostic motion prediction from LiDAR point clouds. Outdoor scenes typic... |
Sep 11, 2025 - S-Lab for Advanced Intelligence
Dai, Yuekun; Li, Haitian; Zhou, Shangchen; Loy, Chen Change, 2025, "Trans-Adapter: A Plug-and-Play Framework for Transparent Image Inpainting", https://doi.org/10.21979/N9/4NI0GT, DR-NTU (Data), V1
RGBA images, with the additional alpha channel, are crucial for any application that needs blending, masking, or transparency effects, making them more versatile than standard RGB images. Nevertheless, existing image inpainting methods are designed exclusively for RGB images. Con... |
Sep 10, 2025 - S-Lab for Advanced Intelligence
Xie, Haozhe; Chen, Zhaoxi; Hong, Fangzhou; Liu, Ziwei, 2025, "Compositional Generative Model of Unbounded 4D Cities", https://doi.org/10.21979/N9/CHQPCL, DR-NTU (Data), V1
3D scene generation has garnered growing attention in recent years and has made significant progress. Generating 4D cities is more challenging than 3D scenes due to the presence of structurally complex, visually diverse objects like buildings and vehicles, and heightened human se... |
Sep 4, 2025 - S-Lab for Advanced Intelligence
Li, Xiaoming; Zuo, Wangmeng; Loy, Chen Change, 2025, "Enhanced Generative Structure Prior for Chinese Text Image Super-Resolution", https://doi.org/10.21979/N9/DTZDDZ, DR-NTU (Data), V1
Faithful text image super-resolution (SR) is challenging because each character has a unique structure and usually exhibits diverse font styles and layouts. While existing methods primarily focus on English text, less attention has been paid to more complex scripts like Chinese.... |
Jul 30, 2025 - Guochu XIONG
Xiong, Guochu, 2025, "Replication Data for: Learning Cache Coherence Traffic for NoC Routing Design", https://doi.org/10.21979/N9/J1RNW8, DR-NTU (Data), V1
The dataset includes the source codes and README file for implementing the design presented in the paper 'Learning Cache Coherence Traffic for NoC Routing Design'. |
Jun 26, 2025 - Yew Lee TAN
Tan, Yew Lee, 2025, "Replication Data for: Dual Downsample Vision Transformer for Handwritten Text Recognition (ICDAR2025)", https://doi.org/10.21979/N9/DREQKD, DR-NTU (Data), V1
Replication Data for: Dual Downsample Vision Transformer for Handwritten Text Recognition (ICDAR2025) to uncompress: cat lines_recognition_part_* | tar --zstd -xvf - |
Jun 26, 2025 - Yew Lee TAN
Tan, Yew Lee, 2025, "BAE project data", https://doi.org/10.21979/N9/AAEFUZ, DR-NTU (Data), V1
BAE project data |
Jun 12, 2025 - S-Lab for Advanced Intelligence
Wu, Size; Jin, Sheng; Zhang, Wenwei; Xu, Lumin; Liu, Wentao; Li, Wei; Loy, Chen Change, 2025, "F-LMM: Grounding Frozen Large Multimodal Models", https://doi.org/10.21979/N9/M0U5AV, DR-NTU (Data), V1
Endowing Large Multimodal Models (LMMs) with visual grounding capability can significantly enhance AIs’ understanding of the visual world and their interaction with humans. However, existing methods typically fine-tune the parameters of LMMs to learn additional segmentation token... |
Jun 5, 2025 - S-Lab for Advanced Intelligence
Liao, Kang; Yue, Zongsheng; Wu, Zhonghua; Loy, Chen Change, 2025, "MOWA: Multiple-in-One Image Warping Model", https://doi.org/10.21979/N9/ZPPMT8, DR-NTU (Data), V1
While recent image warping approaches achieved remarkable success on existing benchmarks, they still require training separate models for each specific task and cannot generalize well to different camera models or customized manipulations. To address diverse types of warping in p... |
