Research
I am interested in LLM Reasoning, Generalizability of LLMs, and Interpretability of
Reasoning and Generalization in LLMs.
|
|
|
On the Limits of Layer Pruning for Generative Reasoning in
LLMs
Safal
Shrestha,
Anubhav Shrestha,
Aadim Nepal,
Minwu Kim,
Keith Ross
arXiv, 2026
arXiv
Existing layer pruning techniques often suffer severe degradation on generative reasoning
tasks. Through a systematic study, we find that tasks requiring multi-step reasoning are
particularly sensitive to depth reduction, exhibiting degradation in critical algorithmic
capabilities like arithmetic and code synthesis.
|
|
|
Training Reasoning Models on Saturated Problems via Failure-Prefix
Conditioning
Minwu Kim,
Safal
Shrestha,
Keith Ross
arXiv, 2026
arXiv
We identify that training reasoning models stalls on saturated problems because informative
failures are rarely encountered. We propose failure-prefix conditioning, which reallocates
exploration by conditioning training on prefixes from rare incorrect reasoning trajectories,
matching performance gains of medium-difficulty problems.
|
|
|
Warm Up Before You Train: Unlocking General Reasoning in
Resource-Constrained Settings
Safal
Shrestha,
Minwu Kim,
Aadim Nepal,
Anubhav Shrestha,
Keith Ross
EMNLP, 2025
arXiv
We find that distilling (warming up) a LLM with non-domain-specific reasoning traces (like a
logic game) can bring general improvements across multiple reasoning-intensive tasks like
math and coding. Reinforcement Learning on top of it leads to better sample efficiency,
generalizability, and final performance.
|
|
|
Layer Importance for Mathematical Reasoning is Forged in
Pre-Training and Invariant after Post-Training
Aadim Nepal,
Safal
Shrestha,
Anubhav Shrestha,
Minwu Kim,
Keith Ross
BlackboxNLP Workshop, EMNLP, 2025
MATH-AI Workshop, NeurIPS, 2025
arXiv
We find that LLMs form critical layers during pretraining whose removal completely destroys
performance. Furthermore, the importance of such layers remain unchanged after post-training
regimes like Reinforcement Learning, Distillation, and Instruction Tuning.
|
|
|
Reinforcement Learning vs. Distillation: Understanding Accuracy and
Capability in LLM Reasoning
Minwu
Kim*,
Anubhav
Shrestha*,
Safal
Shrestha,
Aadim Nepal,
Keith Ross
MATH-AI Workshop, NeurIPS, 2025
arXiv
We investigate why RL with verifiable rewards boosts accuracy but not capability, revealing
it improves easy questions at the cost of hard ones—while distillation improves both only
when new knowledge is introduced.
|
|
|
Mathematical reasoning in large language models: Assessing logical
and arithmetic errors across wide numerical ranges
Safal
Shrestha*,
Minwu
Kim*,
Keith Ross
arXiv, 2025
arXiv
We find that as you increase the magnitude of numbers in simple math problems, LLMs get more
confused and commit logical errors (along with the expected arithmetic errors).
|
|
|