MLC LLM is a universal solution that allows any language model to be deployed natively on a diverse set of hardware backends and native applications.
Please visit Getting Started for detailed instructions.
Our iOS app, MLCChat, is available on App Store for iPhone and iPad. This app is tested on iPhone 15 Pro Max, iPhone 14 Pro Max, iPhone 14 Pro and iPhone 12 Pro. Besides the Getting Started page, documentation is available for building iOS apps with MLC LLM.
Note: Llama-7B takes 4GB of RAM and RedPajama-3B takes 2.2GB to run. We recommend a latest device with 6GB RAM for Llama-7B, or 4GB RAM for RedPajama-3B, to run the app. The text generation speed could vary from time to time, for example, slow in the beginning but recover to a normal speed then.
The demo APK is available to download. The demo is tested on Samsung S23 with Snapdragon 8 Gen 2 chip, Redmi Note 12 Pro with Snapdragon 685 and Google Pixel phones. Besides the Getting Started page, documentation is available for building android apps with MLC LLM.
Our cpp interface runs on AMD, Intel, Apple and NVIDIA GPUs. Besides the Getting Started page, documentation is available for building C++ apps with MLC LLM.
WebLLM is our companion project that deploys MLC LLM natively to browsers using WebGPU and WebAssembly. Still everything runs inside the browser without server resources, and accelerated by local GPUs (e.g. AMD, Intel, Apple or NVIDIA).
The pre-packaged demos are subject to the model License.