Publications

You can also find my articles on my Google Scholar profile.

Conference Papers


Preprint


Model Merging in Pre-training of Large Language Models

Published in arxiv, 2025

This paper comprehensively investigates model merging in pre-training, showing that merging constant-learning-rate checkpoints on dense/MoE architectures (millions to 100B+ params) improves performance, predicts annealing, boosts efficiency, reduces costs, and provides ablation-driven insights.

Download Paper