Learning from the Self-future: On-policy Self-distillation for dLLMs Paper • 2606.18195 • Published 16 days ago • 76
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published May 20 • 207
Target-Oriented Pretraining Data Selection via Neuron-Activated Graph Paper • 2604.15706 • Published Apr 17 • 10
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 509