TL;DR This page collects curated insights on improving LLM reasoning via post-training (like reinforcement learning) and test-time compute (like search and sampling). In short, post-training empowers LLM with reasoning ability, and test-time compute scales it.

Table of Contents
Contact & Feedback

OpenAI, DeepSeek, and More

<aside>

The discussion on reasoning ability started to go viral with the release of OpenAI o-series and DeepSeek R1 models. This section collects thoughts on these two and other reasoning models.

Four tabs in the table (click to unfold) </aside>

Reasoning Models

Post-Training: Gaining Reasoning Ability

<aside>

While reinforcement learning-based fine-tuning is emerging as a post-training approach to enhance LLMs' reasoning ability, its effectiveness compared to supervised fine-tuning is still unclear. This section collects thoughts on both methods for LLM reasoning.

Two tabs in the table (click to unfold) </aside>

Post-Training

Test-Time Compute: Scaling Reasoning Ability

<aside>

Test-time compute is an emerging field where folks are trying tons of different methods (e.g., search and sampling) and using extra components (e.g., verifiers). This section classifies them based on the optimization targets for LLMs. ****Part of this idea comes from “Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters”.

Four tabs/optimization targets in the table (click to unfold) </aside>

Test-Time Compute

Verification: The Key to Reasoning

<aside>

Original line: “Verification, The Key to AI” by Rich Sutton. Verifiers serve as a key component in both post-training (e.g., as reward models to reinforcement learning) and test-time compute (e.g., as signals to guide search). This section tries to collect thoughts on process-based verification, outcome-based verification and more.

</aside>

Verifiers

Other Papers

<aside>

This section collects survey, evaluation, benchmark, application papers.

Five tabs in the table (click to unfold) </aside>

Other Papers

Other Artifacts

<aside>

This section collects online resources like blogs, post, videos, code, data.

Four tabs in the table (click to unfold) </aside>