Author list: Shangshang Wang, Julian Asilis, Ömer Faruk Akgül, Enes Burak Bilgin, Ollie Liu, Deqing Fu, and Willie Neiswanger
📝 arXiv paper 👩💻 Code Repository 🤗 Models & Checkpoints 📈 Training Logs
<aside> 💡
How cost-effectively can we elicit strong reasoning in language models by leveraging their underlying representations? We answer this question with Resa, a family of 1.5B reasoning models trained via a novel and efficient sparse autoencoder tuning (SAE‑Tuning) procedure. Notably, when applied to certain base models before further RL post-training, SAE‑Tuning retains >97% of its RL‑trained counterpart’s reasoning performance while reducing training costs by >2000x to roughly $1 and training time by >450x to around 20 minutes. When applied to lightly RL‑trained models (e.g., within 1 hour on 2 GPUs), it enables reasoning performance such as 43.33% Pass@1 on AIME24 and 90% Pass@1 on AMC23 for only around $1 additional cost. Resa is also transparent, revealing where reasoning abilities hide, also generalizable and modular, enabling plug-and-play transfer across datasets and models.
</aside>
We create Resa models via a novel two-stage SAE-Tuning procedure:
Our Resa models match, and in some cases even surpass, the performance of RL-trained models built on the same base, but with drastically lower cost and compute.
As shown below, $1 and under an hour of SAE-Tuning delivers the same reasoning capability as $2000+ and days of RL. Simply put, minimal compute and training time are now enough to elicit strong reasoning.
Resa models trained via SAE-Tuning closely match their RL-trained counterparts. As shown below, Resa-STILL-v1 and Resa-DeepScaleR-v1 recover over 97% of the reasoning performance of Tina models, using only SFT without CoT traces. In contrast, models trained with standard SFT perform worse, confirming that SAEs are the key to eliciting reasoning abilities.
Resa doesn't just stop at replication from RL-trained models, it can elicit reasoning abilities directly from base models before further RL training. As shown below, models like Resa-STILL-v5 and Resa-DeepScaleR-v3, trained entirely on R1-Distill, still reach strong performance by matching the performance of RL-trained models. Also, even with trained-from-scratch SAEs, SAE-Tuning reliably elicits reasoning abilities, showing its practicality as a lightweight and standalone method.
We take a step further to understand the reasoning abilities we gains from Resa and SAE-Tuning.
Hypothesis I: Generalizable and Modular Reasoning Ability. We showed that the extracted reasoning abilities that are generalizable and modular, transferable across datasets and models.