1 to 10 of 303 Results
Dec 10, 2025 - Junqi ZHAO
Li, Miaoyu; Chao, Qin; Li, Boyang, 2025, "Replication Data for: Two Causally Related Needles in a Video Haystack", https://doi.org/10.21979/N9/WCSXMT, DR-NTU (Data), V1
Causal2Needles is a benchmark dataset and evaluation toolkit designed to assess the capabilities of both proprietary and open-source multimodal large language models in long-video understanding. It features a large number of "2-needle" questions, where the model must locate and r... |
Dec 10, 2025 - Junqi ZHAO
Chinchure, Aditya; Ravi, Sahithya; Ng, Raymond; Shwartz, Vered; Li, Boyang; Sigal, Leonid, 2025, "Replication Data for: Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events", https://doi.org/10.21979/N9/HOAFUL, DR-NTU (Data), V1
BlackSwanSuite is a benchmark for evaluating VLMs’ ability to reason about unexpected events through abductive and defeasible tasks. The tasks either artificially limit the amount of visual information provided to models while questioning them about hidden unexpected events, or p... |
Dec 10, 2025 - Junqi ZHAO
Zhang, Wenyu; Ng, Wei En; Ma, Lixin; Wang, Yuwen; Zhao, Junqi; Koenecke, Allison; Li, Boyang; Wang, Lu, 2025, "Replication Data for: SPHERE: A Hierarchical Evaluation on Spatial Perception and Reasoning for Vision-Language Models", https://doi.org/10.21979/N9/HI9OFD, DR-NTU (Data), V2
SPHERE (Spatial Perception and Hierarchical Evaluation of Reasoning) is a hierarchical evaluation framework built on a new human-annotated dataset of 2,285 question–answer pairs. It systematically probes models across increasing levels of complexity, from fundamental skills to mu... |
Dec 10, 2025 - Junqi ZHAO
Tiong, Anthony Meng Huat; Zhao, Junqi; Li, Boyang; Li, Junnan; Hoi, Steven C.H.; Xiong, Caiming, 2025, "Replication Data for: What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases", https://doi.org/10.21979/N9/SL0VV1, DR-NTU (Data), V1
The OLIVE dataset is a highly diverse, human-corrected multi-modal collection designed to simulate the variety and idiosyncrasies of user queries vision-language models (VLMs) face in real-world scenarios. It supports the training and evaluation of VLMs in conditions that more cl... |
Dec 10, 2025
|
Oct 7, 2025 - S-Lab for Advanced Intelligence
Zhang, Yuanhan; Chew, Yunice; Dong, Yuhao; Leo, Aria; Hu, Bo; Liu, Ziwei, 2025, "Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding", https://doi.org/10.21979/N9/KTBVSQ, DR-NTU (Data), V1
We introduce the Video Thinking Test (Video-TT), a benchmark designed to assess if video LLMs can interpret real-world videos as effectively as humans. Video-TT 1) differentiates between errors due to inadequate frame sampling and genuine gaps in understanding complex visual narr... |
Sep 17, 2025 - S-Lab for Advanced Intelligence
Li, Ruibo; Shi, Hanyu; Wang, Zhe; Lin, Guosheng, 2025, "Weakly and Self-Supervised Class-Agnostic Motion Prediction for Autonomous Driving", https://doi.org/10.21979/N9/PE8MLE, DR-NTU (Data), V1
Understanding motion in dynamic environments is critical for autonomous driving, thereby motivating research on class-agnostic motion prediction. In this work, we investigate weakly and self-supervised class-agnostic motion prediction from LiDAR point clouds. Outdoor scenes typic... |
Sep 11, 2025 - S-Lab for Advanced Intelligence
Dai, Yuekun; Li, Haitian; Zhou, Shangchen; Loy, Chen Change, 2025, "Trans-Adapter: A Plug-and-Play Framework for Transparent Image Inpainting", https://doi.org/10.21979/N9/4NI0GT, DR-NTU (Data), V1
RGBA images, with the additional alpha channel, are crucial for any application that needs blending, masking, or transparency effects, making them more versatile than standard RGB images. Nevertheless, existing image inpainting methods are designed exclusively for RGB images. Con... |
Sep 10, 2025 - S-Lab for Advanced Intelligence
Xie, Haozhe; Chen, Zhaoxi; Hong, Fangzhou; Liu, Ziwei, 2025, "Compositional Generative Model of Unbounded 4D Cities", https://doi.org/10.21979/N9/CHQPCL, DR-NTU (Data), V1
3D scene generation has garnered growing attention in recent years and has made significant progress. Generating 4D cities is more challenging than 3D scenes due to the presence of structurally complex, visually diverse objects like buildings and vehicles, and heightened human se... |
Sep 4, 2025 - S-Lab for Advanced Intelligence
Li, Xiaoming; Zuo, Wangmeng; Loy, Chen Change, 2025, "Enhanced Generative Structure Prior for Chinese Text Image Super-Resolution", https://doi.org/10.21979/N9/DTZDDZ, DR-NTU (Data), V1
Faithful text image super-resolution (SR) is challenging because each character has a unique structure and usually exhibits diverse font styles and layouts. While existing methods primarily focus on English text, less attention has been paid to more complex scripts like Chinese.... |
