About me

Hi👋! My name is Xunhao Lai (赖勋豪). I am a Master’s student at the School of Intelligence Science and Technology at Peking University, supervised by Professor Tong Lin. Before that, I was an undergraduate student at Yuan Pei College, Peking University.

My research focuses on natural language processing and large language models. Specifically, I concentrate on long context models, exploring innovative and efficient attention mechanisms, as well as optimizing the efficiency of model training and inference.

Publications

[ICLR 2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

Xunhao Lai, Jianqiao Lu, Yao Luo, Yiyuan Ma, Xun Zhou

arxiv GitHub

FlexPrefill

FlexPrefill is a flexible sparse pre-filling mechanism for LLMs that dynamically adjusts attention patterns in real-time, improving speed and accuracy in long-sequence inference.

Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?

Xi Chen, Kaituo Feng, Changsheng Li, Xunhao Lai, Xiangyu Yue, Ye Yuan, Guoren Wang

arxiv GitHub

Fira

Fira is a new training framework for LLMs that achieves full-rank training performance while maintaining low-rank memory efficiency in both pre-training and fine-tuning.

Open-Source Projects

native-sparse-attention-triton

GitHub

Implemented the Deepseek Native Sparse Attention kernel using Triton, providing flexible and efficient sparse attention training code.

FlexPrefill

GitHub

Implemented the FlexPrefill long-text inference acceleration algorithm, offering a flexible and efficient acceleration solution for long context LLMs.

Contact

E-mail: laixunhao@pku.edu.cn

Address: Peking University, Beijing, China