Safal Shrestha

I'm a research assistant in the Deep Learning Lab at New York University Abu Dhabi in Abu Dhabi, UAE, working under Prof. Keith Ross.

Email  /  CV  /  Scholar  /  Github

profile photo

News

Feb 2026 Two new papers on Layer Pruning and Failure-Prefix Conditioning are out on arXiv!
Nov 2025 Attended EMNLP 2025 to present our paper "Warm Up Before You Train".

Research

I am interested in LLM Reasoning, Generalizability of LLMs, and Interpretability of Reasoning and Generalization in LLMs.

On the Limits of Layer Pruning for Generative Reasoning in LLMs
Safal Shrestha, Anubhav Shrestha, Aadim Nepal, Minwu Kim, Keith Ross
arXiv, 2026
arXiv

Existing layer pruning techniques often suffer severe degradation on generative reasoning tasks. Through a systematic study, we find that tasks requiring multi-step reasoning are particularly sensitive to depth reduction, exhibiting degradation in critical algorithmic capabilities like arithmetic and code synthesis.

Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning
Minwu Kim, Safal Shrestha, Keith Ross
arXiv, 2026
arXiv

We identify that training reasoning models stalls on saturated problems because informative failures are rarely encountered. We propose failure-prefix conditioning, which reallocates exploration by conditioning training on prefixes from rare incorrect reasoning trajectories, matching performance gains of medium-difficulty problems.

Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings
Safal Shrestha, Minwu Kim, Aadim Nepal, Anubhav Shrestha, Keith Ross
EMNLP, 2025
arXiv

We find that distilling (warming up) a LLM with non-domain-specific reasoning traces (like a logic game) can bring general improvements across multiple reasoning-intensive tasks like math and coding. Reinforcement Learning on top of it leads to better sample efficiency, generalizability, and final performance.

Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training
Aadim Nepal, Safal Shrestha, Anubhav Shrestha, Minwu Kim, Keith Ross
BlackboxNLP Workshop, EMNLP, 2025
MATH-AI Workshop, NeurIPS, 2025
arXiv

We find that LLMs form critical layers during pretraining whose removal completely destroys performance. Furthermore, the importance of such layers remain unchanged after post-training regimes like Reinforcement Learning, Distillation, and Instruction Tuning.

Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning
Minwu Kim*, Anubhav Shrestha*, Safal Shrestha, Aadim Nepal, Keith Ross
MATH-AI Workshop, NeurIPS, 2025
arXiv

We investigate why RL with verifiable rewards boosts accuracy but not capability, revealing it improves easy questions at the cost of hard ones—while distillation improves both only when new knowledge is introduced.

Mathematical reasoning in large language models: Assessing logical and arithmetic errors across wide numerical ranges
Safal Shrestha*, Minwu Kim*, Keith Ross
arXiv, 2025
arXiv

We find that as you increase the magnitude of numbers in simple math problems, LLMs get more confused and commit logical errors (along with the expected arithmetic errors).


Website referenced from Jon Barron's source code.