Author list: Shangshang Wang, Julian Asilis, Ömer Faruk Akgül, Enes Burak Bilgin, Ollie Liu, and Willie Neiswanger
📝 arXiv paper 👩💻 Code Repository 🤗 Models & Checkpoints 📈 Training Logs
<aside>
How cost-effectively can we elicit reasoning abilities in language models? Driven by this important question, we present Tina, a family of 1.5B reasoning models trained by LoRA-based RL with high cost-efficiency. This minimalist approach produces models that are competitive with, and sometimes surpass, SOTA RL reasoning models built on the same base models. The best Tina model achieves a >20% reasoning performance increase and 43% Pass@1 accuracy on AIME24, at only $9 USD post-training and evaluation cost!
</aside>
We release Tina: an open-source family of tiny reasoning models combining three key ingredients:
Our Tina models compete with, and sometimes surpass, SOTA models sharing the same base model — at a fraction of the cost!
Simply put, less compute is yielding more performance!
With only minimal post-training, Tina achieves >20% performance increase over the base model and 43% Pass@1 on AIME24 at its best checkpoint.
We confirm our finding across multiple open-source reasoning datasets, with ablations on learning rate, LoRA rank, and RL algorithm.
Broadly speaking, we find that Tina’s performance is fairly robust to each of these factors.