Safal Shrestha

I'm a research assistant in the Deep Learning Lab at New York University Abu Dhabi in Abu Dhabi, UAE, working under Prof. Keith Ross.

Email / CV / Scholar / Github

News

Feb 2026	Two new papers on Layer Pruning and Failure-Prefix Conditioning are out on arXiv!
Nov 2025	Attended EMNLP 2025 to present our paper "Warm Up Before You Train".

Research

I am interested in LLM Reasoning, Generalizability of LLMs, and Interpretability of Reasoning and Generalization in LLMs.

	On the Limits of Layer Pruning for Generative Reasoning in LLMs Safal Shrestha, Anubhav Shrestha, Aadim Nepal, Minwu Kim, Keith Ross arXiv, 2026 arXiv Existing layer pruning techniques often suffer severe degradation on generative reasoning tasks. Through a systematic study, we find that tasks requiring multi-step reasoning are particularly sensitive to depth reduction, exhibiting degradation in critical algorithmic capabilities like arithmetic and code synthesis.
	Training Reasoning Models on Saturated Problems via Failure-Prefix Conditioning Minwu Kim, Safal Shrestha, Keith Ross arXiv, 2026 arXiv We identify that training reasoning models stalls on saturated problems because informative failures are rarely encountered. We propose failure-prefix conditioning, which reallocates exploration by conditioning training on prefixes from rare incorrect reasoning trajectories, matching performance gains of medium-difficulty problems.
	Warm Up Before You Train: Unlocking General Reasoning in Resource-Constrained Settings Safal Shrestha, Minwu Kim, Aadim Nepal, Anubhav Shrestha, Keith Ross EMNLP, 2025 arXiv We find that distilling (warming up) a LLM with non-domain-specific reasoning traces (like a logic game) can bring general improvements across multiple reasoning-intensive tasks like math and coding. Reinforcement Learning on top of it leads to better sample efficiency, generalizability, and final performance.
	Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training Aadim Nepal, Safal Shrestha, Anubhav Shrestha, Minwu Kim, Keith Ross BlackboxNLP Workshop, EMNLP, 2025 MATH-AI Workshop, NeurIPS, 2025 arXiv We find that LLMs form critical layers during pretraining whose removal completely destroys performance. Furthermore, the importance of such layers remain unchanged after post-training regimes like Reinforcement Learning, Distillation, and Instruction Tuning.
	Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning Minwu Kim, Anubhav Shrestha, Safal Shrestha, Aadim Nepal, Keith Ross MATH-AI Workshop, NeurIPS, 2025 arXiv We investigate why RL with verifiable rewards boosts accuracy but not capability, revealing it improves easy questions at the cost of hard ones—while distillation improves both only when new knowledge is introduced.
	Mathematical reasoning in large language models: Assessing logical and arithmetic errors across wide numerical ranges Safal Shrestha, Minwu Kim, Keith Ross arXiv, 2025 arXiv We find that as you increase the magnitude of numbers in simple math problems, LLMs get more confused and commit logical errors (along with the expected arithmetic errors).

Website referenced from Jon Barron's source code.