Portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 1
Short description of portfolio item number 2
Published in arxiv, 2024
This paper proposes Fira, a new training framework for Large Language Models that achieves full-rank training performance while maintaining low-rank memory efficiency, outperforming existing approaches in pre-training and fine-tuning experiments.
Published in ICLR, 2025
This paper introduces FlexPrefill, a flexible sparse pre-filling mechanism for large language models that dynamically adjusts attention patterns in real-time, improving speed and accuracy in long-sequence inference compared to prior sparse attention methods.
Published in arxiv, 2025
This paper comprehensively investigates model merging in pre-training, showing that merging constant-learning-rate checkpoints on dense/MoE architectures (millions to 100B+ params) improves performance, predicts annealing, boosts efficiency, reduces costs, and provides ablation-driven insights.
Published:
This is a description of your talk, which is a markdown file that can be all markdown-ified like any other post. Yay markdown!
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.