11 to 20 of 298 Results
Jun 12, 2025 - S-Lab for Advanced Intelligence
Wu, Size; Jin, Sheng; Zhang, Wenwei; Xu, Lumin; Liu, Wentao; Li, Wei; Loy, Chen Change, 2025, "F-LMM: Grounding Frozen Large Multimodal Models", https://doi.org/10.21979/N9/M0U5AV, DR-NTU (Data), V1
Endowing Large Multimodal Models (LMMs) with visual grounding capability can significantly enhance AIs’ understanding of the visual world and their interaction with humans. However, existing methods typically fine-tune the parameters of LMMs to learn additional segmentation token... |
Jun 5, 2025 - S-Lab for Advanced Intelligence
Liao, Kang; Yue, Zongsheng; Wu, Zhonghua; Loy, Chen Change, 2025, "MOWA: Multiple-in-One Image Warping Model", https://doi.org/10.21979/N9/ZPPMT8, DR-NTU (Data), V1
While recent image warping approaches achieved remarkable success on existing benchmarks, they still require training separate models for each specific task and cannot generalize well to different camera models or customized manipulations. To address diverse types of warping in p... |
Jun 3, 2025 - S-Lab for Advanced Intelligence
Zhou, Yifan; Xiao, Zeqi; Yang, Shuai; Pan, Xingang, 2025, "Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space", https://doi.org/10.21979/N9/Y6AOQH, DR-NTU (Data), V1
Latent Diffusion Models (LDMs) are known to have an unstable generation process, where even small perturbations or shifts in the input noise can lead to significantly different outputs. This hinders their applicability in applications requiring consistent results. In this work, w... |
Jun 3, 2025 - S-Lab for Advanced Intelligence
Shen, Liao; Liu, Tianqi; Sun, Huiqiang; Li, Jiaqi; Cao, Zhiguo; Li, Wei; Loy, Chen Change, 2025, "DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting", https://doi.org/10.21979/N9/JKJHNJ, DR-NTU (Data), V1
Recent advances in 3D Gaussian Splatting (3D-GS) have shown remarkable success in representing 3D scenes and generating high-quality, novel views in real-time. However, 3D-GS and its variants assume that input images are captured based on pinhole imaging and are fully in focus. T... |
May 22, 2025 - S-Lab for Advanced Intelligence
Xu, Yuanmu; Hou, Guanli; Hu, Jiangbei; Ren, Tenglong; Wang, Xiaokun; Zhang, Yalan; Ban, Xiaojuan; Qian, Chen; Hou, Fei; He, Ying, 2025, "NeuS: Physics and Geometry-Augmented Neural Implicit Surfaces for Rigid Bodies", https://doi.org/10.21979/N9/LTXKFL, DR-NTU (Data), V1
This paper tackles the challenges of physics-based simulation of rigid bodies in neural rendering, focusing on 3D model representation and collision handling. A synthetic and real-world dataset is also included in the paper. |
May 22, 2025 - S-Lab for Advanced Intelligence
Xu, Qianxiong; Zhu, Lanyun; Liu, Xuanyi; Lin, Guosheng; Long, Cheng; Li, Ziyue; Zhao, Rui, 2025, "Unlocking the Power of SAM 2 for Few-Shot Segmentation", https://doi.org/10.21979/N9/XIDXVT, DR-NTU (Data), V1
Few-Shot Segmentation (FSS) aims to learn class-agnostic segmentation on few classes to segment arbitrary classes, but at the risk of overfitting. To address this, some methods use the well-learned knowledge of foundation models (e.g., SAM) to simplify the learning process. Recen... |
May 16, 2025 - S-Lab for Advanced Intelligence
Liu, Chenxi; Miao, Hao; Xu, Qianxiong; Zhou, Shaowen; Long, Cheng; Zhao, Yan; Li, Ziyue, 2025, "Efficient Multivariate Time Series Forecasting via Calibrated Language Models with Privileged Knowledge Distillation", https://doi.org/10.21979/N9/6WWC6K, DR-NTU (Data), V1
Multivariate time series forecasting (MTSF) endeavors to predict future observations given historical data, playing a crucial role in time series data management systems. With advancements in large language models (LLMs), recent studies employ textual prompt tuning to infuse the... |
May 13, 2025 - S-Lab for Advanced Intelligence
Liu, Chenxi; Zhou, Shaowen; Xu, Qianxiong; Miao, Hao; Long, Cheng; Li, Ziyue; Zhao, Rui, 2025, "Towards Cross-Modality Modeling for Time Series Analytics: A Survey in the LLM Era", https://doi.org/10.21979/N9/I0HOYZ, DR-NTU (Data), V1
The proliferation of edge devices has generated an unprecedented volume of time series data across different domains, motivating various well-customized methods. Recently, Large Language Models (LLMs) have emerged as a new paradigm for time series analytics by leveraging the shar... |
May 9, 2025 - S-Lab for Advanced Intelligence
Dong, Yuhao; Liu, Zuyan; Sun, Hai-Long; Yang, Jingkang; Hu, Winston; Rao, Yongming; Liu, Ziwei, 2025, "Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models", https://doi.org/10.21979/N9/Y0TZUB, DR-NTU (Data), V1
Large Language Models (LLMs) demonstrate enhanced capabilities and reliability by reasoning more, evolving from Chain-of-Thought prompting to product-level solutions like OpenAI o1. Despite various efforts to improve LLM reasoning, high-quality long-chain reasoning data and optim... |
May 9, 2025 - S-Lab for Advanced Intelligence
Huang, Zihao; Hu, Shoukang; Wang, Guangcong; Liu, Tianqi; Zang, Yuhang; Cao, Zhiguo; Li, Wei; Liu, Ziwei, 2025, "WildAvatar: Learning In-the-wild 3D Avatars from the Web", https://doi.org/10.21979/N9/5G18B1, DR-NTU (Data), V1
Existing research on avatar creation is typically limited to laboratory datasets, which require high costs against scalability and exhibit insufficient representation of the real world. On the other hand, the web abounds with off-the-shelf real-world human videos, but these video... |
