πŸ‘‹ Welcome to MLC LLMΒΆ

Discord | GitHub

πŸ‘‰ πŸ‘‰ Get started by trying out the MLC Chat.

Machine Learning Compilation for LLM (MLC LLM) is a high-performance universal deployment for large-language models.

MLC LLM: A universal deployment solution for large language modelsΒΆ

AMD GPU

NVIDIA GPU

Apple M1/M2 GPU

Intel GPU

Linux / Win

βœ… Vulkan, ROCm

βœ… Vulkan, CUDA

N/A

βœ… Vulkan

macOS

βœ… Metal

N/A

βœ… Metal

βœ… Metal

Web Browser

βœ… WebGPU

βœ… WebGPU

βœ… WebGPU

βœ… WebGPU

iOS / iPadOS

βœ… Metal on Apple M1/M2 GPU

Android

βœ… OpenCL on Adreno GPU

βœ… OpenCL on Mali GPU


If you find MLC LLM useful in your work, please consider citing the project using the following format:

@software{mlc-llm,
   author = {MLC team},
   title = {{MLC-LLM}},
   url = {https://github.com/mlc-ai/mlc-llm},
   year = {2023}
}

The underlying compiler techniques employed by MLC LLM are outlined in the following papers:

References (Click to expand)
@inproceedings{tensorir,
   author = {Feng, Siyuan and Hou, Bohan and Jin, Hongyi and Lin, Wuwei and Shao, Junru and Lai, Ruihang and Ye, Zihao and Zheng, Lianmin and Yu, Cody Hao and Yu, Yong and Chen, Tianqi},
   title = {TensorIR: An Abstraction for Automatic Tensorized Program Optimization},
   year = {2023},
   isbn = {9781450399166},
   publisher = {Association for Computing Machinery},
   address = {New York, NY, USA},
   url = {https://doi.org/10.1145/3575693.3576933},
   doi = {10.1145/3575693.3576933},
   booktitle = {Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2},
   pages = {804–817},
   numpages = {14},
   keywords = {Tensor Computation, Machine Learning Compiler, Deep Neural Network},
   location = {Vancouver, BC, Canada},
   series = {ASPLOS 2023}
}

@inproceedings{metaschedule,
   author = {Shao, Junru and Zhou, Xiyou and Feng, Siyuan and Hou, Bohan and Lai, Ruihang and Jin, Hongyi and Lin, Wuwei and Masuda, Masahiro and Yu, Cody Hao and Chen, Tianqi},
   booktitle = {Advances in Neural Information Processing Systems},
   editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh},
   pages = {35783--35796},
   publisher = {Curran Associates, Inc.},
   title = {Tensor Program Optimization with Probabilistic Programs},
   url = {https://proceedings.neurips.cc/paper_files/paper/2022/file/e894eafae43e68b4c8dfdacf742bcbf3-Paper-Conference.pdf},
   volume = {35},
   year = {2022}
}

@inproceedings{tvm,
   author = {Tianqi Chen and Thierry Moreau and Ziheng Jiang and Lianmin Zheng and Eddie Yan and Haichen Shen and Meghan Cowan and Leyuan Wang and Yuwei Hu and Luis Ceze and Carlos Guestrin and Arvind Krishnamurthy},
   title = {{TVM}: An Automated {End-to-End} Optimizing Compiler for Deep Learning},
   booktitle = {13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)},
   year = {2018},
   isbn = {978-1-939133-08-3},
   address = {Carlsbad, CA},
   pages = {578--594},
   url = {https://www.usenix.org/conference/osdi18/presentation/chen},
   publisher = {USENIX Association},
   month = oct,
}

If you are interested in using Machine Learning Compilation in practice, we highly recommend the following course: