I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders Paper • 2503.18878 • Published Mar 24, 2025 • 121
Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models Paper • 2605.26895 • Published 21 days ago • 20
LLM Explainability with Counterfactual Chains and Causal Graphs Paper • 2606.05972 • Published 12 days ago • 17
A Geometric Account of Activation Steering through Angle-Norm Decomposition Paper • 2606.06735 • Published 12 days ago • 22
ICA Lens: Interpreting Language Models Without Training Another Dictionary Paper • 2606.11722 • Published 6 days ago • 15