Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm
Paper • 2602.11543 • Published • 6
This collection showcases VCLab's investigations on new architectures of vision transformers, LLMs, VLMs and their training paradigms.
Note [arXiv 2026] Memory-efficient decentralized LLM pretraining. | Code: https://github.com/zjr2000/SPES
Note [CVPR 2026] One-bit QK-attention for ViT/DiT. | Code: https://github.com/EdwardChasel/BinaryAttention
Note [ICLR 2025] Structure-aware state fusion for visual Mamba. | Code: https://github.com/EdwardChasel/Spatial-Mamba
Note [NeurIPS 2024 Spotlight] Group-free Mamba SSM for 3D detection. | Code: https://github.com/gwenzhang/Voxel-Mamba
Note [CVPR 2024] Unified universal video segmentation. | Code: https://github.com/MinghanLi/UniVS