Posts by Collection

portfolio

publications

Model Merging in Pre-training of Large Language Models

Published in arxiv, 2025

This paper comprehensively investigates model merging in pre-training, showing that merging constant-learning-rate checkpoints on dense/MoE architectures (millions to 100B+ params) improves performance, predicts annealing, boosts efficiency, reduces costs, and provides ablation-driven insights.

Download Paper

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.