Author list: Shangshang Wang, Julian Asilis, Ömer Faruk Akgül, Enes Burak Bilgin, Ollie Liu, and Willie Neiswanger

📝 arXiv paper 👩‍💻 Code Repository 🤗 Models & Checkpoints 📈 Training Logs

<aside>

⭐ TL;DR

How cost-effectively can we elicit reasoning abilities in language models? Driven by this important question, we present Tina, a family of 1.5B reasoning models trained by LoRA-based RL with high cost-efficiency. This minimalist approach produces models that are competitive with, and sometimes surpass, SOTA RL reasoning models built on the same base models. The best Tina model achieves a >20% reasoning performance increase and 43% Pass@1 accuracy on AIME24, at only $9 USD post-training and evaluation cost!

</aside>

Tina: Tiny Reasoning Models via LoRA

image.png

We release Tina: an open-source family of tiny reasoning models combining three key ingredients:

  1. A powerful yet tiny base model: Every Tina model is built upon DeepSeek-R1-Distill-Qwen-1.5B, a base model possessing impressive abilities at a minimal computational footprint.
  2. Parameter-efficient post-training: We use low-rank adaptation (LoRA) during the reinforcement learning (RL) stage, thereby minimizing computational costs without sacrificing reasoning gains. In fact, we sometimes improve reasoning performance over full-parameter post-training!
  3. Carefully chosen datasets: Each Tina model is post-trained on a small and high-quality reasoning dataset, further minimizing the computational costs of our pipeline.

LoRA-based RL is Surprisingly Effective!

Our Tina models compete with, and sometimes surpass, SOTA models sharing the same base model — at a fraction of the cost!

Simply put, less compute is yielding more performance!

Screenshot 2025-04-22 at 8.08.37 PM (1).png

With only minimal post-training, Tina achieves >20% performance increase over the base model and 43% Pass@1 on AIME24 at its best checkpoint.

image.png

We confirm our finding across multiple open-source reasoning datasets, with ablations on learning rate, LoRA rank, and RL algorithm.

Broadly speaking, we find that Tina’s performance is fairly robust to each of these factors.