Author list: Shangshang Wang, Julian Asilis, Ömer Faruk Akgül, Enes Burak Bilgin, Ollie Liu, Deqing Fu, and Willie Neiswanger

📝 arXiv paper 👩‍💻 Code Repository 🤗 Models & Checkpoints 📈 Training Logs

<aside> 💡

⭐ TL;DR

How cost-effectively can we elicit strong reasoning in language models by leveraging their underlying representations? We answer this question with Resa, a family of 1.5B reasoning models trained via a novel and efficient sparse autoencoder tuning (SAE‑Tuning) procedure. Notably, when applied to certain base models before further RL post-training, SAE‑Tuning retains >97% of its RL‑trained counterpart’s reasoning performance while reducing training costs by >2000x to roughly $1 and training time by >450x to around 20 minutes. When applied to lightly RL‑trained models (e.g., within 1 hour on 2 GPUs), it enables reasoning performance such as 43.33% Pass@1 on AIME24 and 90% Pass@1 on AMC23 for only around $1 additional cost. Resa is also transparent, revealing where reasoning abilities hide, also generalizable and modular, enabling plug-and-play transfer across datasets and models.

</aside>

Resa: Transparent Reasoning Models via SAEs

image.png

We create Resa models via a novel two-stage SAE-Tuning procedure:

  1. SAE Training (Reasoning Ability Extraction): In the first stage, we train an SAE to reconstruct activations from a specific layer of a source model (like Tina or R1-Distill). We pass in a set of verified question–answer examples, called the trigger dataset, and record the model’s activations at a chosen layer. The SAE is trained on these activations to build a sparse dictionary of internal features. The key idea is that some of these features capture the model’s latent reasoning patterns.
  2. SAE-Guided SFT (Reasoning Ability Elicitation): Next, we insert the trained SAE into a target model (typically with the same architecture as the source model) and keep it frozen. As we train the model via a standard supervised fine-tuning (SFT), the SAE acts as a scaffold, guiding learning by injecting the reasoning features it captured. This helps the model build internal reasoning abilities, without explicit step-by-step reasoning traces.

RL-Level Reasoning with Minimal Compute!

Our Resa models match, and in some cases even surpass, the performance of RL-trained models built on the same base, but with drastically lower cost and compute.

As shown below, $1 and under an hour of SAE-Tuning delivers the same reasoning capability as $2000+ and days of RL. Simply put, minimal compute and training time are now enough to elicit strong reasoning.

image.png

Resa models trained via SAE-Tuning closely match their RL-trained counterparts. As shown below, Resa-STILL-v1 and Resa-DeepScaleR-v1 recover over 97% of the reasoning performance of Tina models, using only SFT without CoT traces. In contrast, models trained with standard SFT perform worse, confirming that SAEs are the key to eliciting reasoning abilities.

image.png

Resa doesn't just stop at replication from RL-trained models, it can elicit reasoning abilities directly from base models before further RL training. As shown below, models like Resa-STILL-v5 and Resa-DeepScaleR-v3, trained entirely on R1-Distill, still reach strong performance by matching the performance of RL-trained models. Also, even with trained-from-scratch SAEs, SAE-Tuning reliably elicits reasoning abilities, showing its practicality as a lightweight and standalone method.

image.png

A Step Further to Reasoning Abilities

We take a step further to understand the reasoning abilities we gains from Resa and SAE-Tuning.

Hypothesis I: Generalizable and Modular Reasoning Ability. We showed that the extracted reasoning abilities that are generalizable and modular, transferable across datasets and models.