21 to 30 of 227 Results
May 9, 2025 - S-Lab for Advanced Intelligence
Dong, Yuhao; Liu, Zuyan; Sun, Hai-Long; Yang, Jingkang; Hu, Winston; Rao, Yongming; Liu, Ziwei, 2025, "Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models", https://doi.org/10.21979/N9/Y0TZUB, DR-NTU (Data), V1
Large Language Models (LLMs) demonstrate enhanced capabilities and reliability by reasoning more, evolving from Chain-of-Thought prompting to product-level solutions like OpenAI o1. Despite various efforts to improve LLM reasoning, high-quality long-chain reasoning data and optim... |
May 9, 2025 - S-Lab for Advanced Intelligence
Huang, Zihao; Hu, Shoukang; Wang, Guangcong; Liu, Tianqi; Zang, Yuhang; Cao, Zhiguo; Li, Wei; Liu, Ziwei, 2025, "WildAvatar: Learning In-the-wild 3D Avatars from the Web", https://doi.org/10.21979/N9/5G18B1, DR-NTU (Data), V1
Existing research on avatar creation is typically limited to laboratory datasets, which require high costs against scalability and exhibit insufficient representation of the real world. On the other hand, the web abounds with off-the-shelf real-world human videos, but these video... |
Apr 29, 2025 - S-Lab for Advanced Intelligence
Wang, Yiran; Li, Jiaqi; Hong, Chaoyi; Li, Ruibo; Sun, Liusheng; Song, Xiao; Wang, Zhe; Cao, Zhiguo; Lin, Guosheng, 2025, "TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion", https://doi.org/10.21979/N9/Q57ZYR, DR-NTU (Data), V1
Radar-Camera depth estimation aims to predict dense and accurate metric depth by fusing input images and Radar data. Model efficiency is crucial for this task in pursuit of real-time processing on autonomous vehicles and robotic platforms. However, due to the sparsity of Radar re... |
Apr 14, 2025 - AcRF Tier 2 Influenza
Ivan, Fransiskus Xaverius; Deshpande, Akhila; Lim, Chun Wei; Kwoh, Chee Keong, 2019, "Phylogenetic Tree-based Pipeline for Uncovering Mutational Patterns during Influenza Virus Evolution", https://doi.org/10.21979/N9/PDYCUD, DR-NTU (Data), V2
Various computational and statistical approaches have been proposed to uncover the mutational patterns of rapidly evolving influenza viral genes. A problem that draws much attention is to identify pairs of sites that potentially co-mutate to contribute to the overall fitness of t... |
Apr 9, 2025 - S-Lab for Advanced Intelligence
Chen, Yiwen; He, Tong; Di, Huang; Ye, Weicai; Chen, Sijin; Tang, Jiaxiang; Chen, Xin; Cai, Zhongang; Yang, Lei; Yu, Gang; Lin, Guosheng; Zhang, Chi, 2025, "MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers", https://doi.org/10.21979/N9/BJPHCP, DR-NTU (Data), V1
Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because these assets always need to be converted to meshes for 3D industr... |
Apr 9, 2025 - S-Lab for Advanced Intelligence
Chen, Yongwei; Lan, Yushi; Zhou, Shangchen; Wang, Tengfei; Pan, Xingang, 2025, "SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE", https://doi.org/10.21979/N9/OK5SSQ, DR-NTU (Data), V1
Autoregressive models have demonstrated remarkable success across various fields, from large language models (LLMs) to large multimodal models (LMMs) and 2D content generation, moving closer to artificial general intelligence (AGI). Despite these advances, applying autoregressive... |
Apr 9, 2025 - S-Lab for Advanced Intelligence
Lan, Mengcheng; Chen, Chaofeng; Zhou, Yue; Ke, Yiping; Wang, Xinjiang; Feng, Litong; Zhang, Wayne, 2025, "Text4Seg: Reimagining Image Segmentation as Text Generation", https://doi.org/10.21979/N9/FF4YJY, DR-NTU (Data), V1
Multimodal Large Language Models (MLLMs) have shown exceptional capabilities in vision-language tasks; however, effectively integrating image segmentation into these models remains a significant challenge. In this paper, we introduce Text4Seg, a novel text-as-mask paradigm that c... |
Mar 19, 2025 - S-Lab for Advanced Intelligence
Jin, Daisheng; Hu, Jiangbei; Xu, Baixin; Dai, Yuxin; Qian, Chen; He, Ying, 2025, "SFDM: Robust Decomposition of Geometry and Reflectance for Realistic Face Rendering from Sparse-view Images", https://doi.org/10.21979/N9/3DCDXV, DR-NTU (Data), V1
In this study, we introduce a novel two-stage technique for decomposing and reconstructing facial features from sparse-view images, a task made challenging by the unique geometry and complex skin reflectance of each individual. To synthesize 3D facial models more realistically, w... |
Mar 19, 2025 - S-Lab for Advanced Intelligence
Pang, Hui En; Liu, Shuai; Cai, Zhongang; Lei, Yang; Zhang, Tianwei; Liu, Ziwei, 2025, "Disco4D: Disentangled 4D Human Generation and Animation from a Single Image", https://doi.org/10.21979/N9/VX5WAH, DR-NTU (Data), V1
We present Disco4D, a novel Gaussian Splatting framework for 4D human generation and animation from a single image. Different from existing methods, Disco4D distinctively disentangles clothings (with Gaussian models) from the human body (with SMPL-X model), significantly enhancin... |
Mar 19, 2025 - S-Lab for Advanced Intelligence
Jiang, Jianping; Xiao, Weiye; Lin, Zhengyu; Zhang, Huaizhong; Ren, Tianxiang; Gao, Yang; Lin, Zhiqian; Cai, Zhongang; Yang, Lei; Liu, Ziwei, 2025, "SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters", https://doi.org/10.21979/N9/GYJNH1, DR-NTU (Data), V1
Human beings are social animals. How to equip 3D autonomous characters with similar social intelligence that can perceive, understand and interact with humans remains an open yet foundamental problem. In this paper, we introduce SOLAMI, the first end-to-end Social vision-Language... |
