Install MLC LLM Python Package

MLC LLM Python Package can be installed directly from a prebuilt developer package, or built from source.

Option 1. Prebuilt Package

We provide nightly built pip wheels for MLC-LLM via pip. Select your operating system/compute platform and run the command in your terminal:

Note

❗ Whenever using Python, it is highly recommended to use conda to manage an isolated Python environment to avoid missing dependencies, incompatible versions, and package conflicts.

conda activate your-environment
python3 -m pip install --pre -U -f https://mlc.ai/wheels mlc-llm-nightly mlc-ai-nightly

Note

conda install -c conda-forge gcc libvulkan-loader

If encountering issues with GLIBC not found, please install the latest glibc in conda:

conda install -c conda-forge libgcc-ng

Besides, we would recommend using Python 3.11; so if you are creating a new environment, you could use the following command:

conda create --name mlc-prebuilt  python=3.11

Then you can verify installation in command line:

python -c "import mlc_llm; print(mlc_llm)"
# Prints out: <module 'mlc_llm' from '/path-to-env/lib/python3.11/site-packages/mlc_llm/__init__.py'>

Option 2. Build from Source

We also provide options to build mlc runtime libraries mlc_llm from source. This step is useful when you want to make modification or obtain a specific version of mlc runtime.

Step 1. Set up build dependency. To build from source, you need to ensure that the following build dependencies are satisfied:

  • CMake >= 3.24

  • Git

  • Rust and Cargo, required by Hugging Face’s tokenizer

  • One of the GPU runtimes:

    • CUDA >= 11.8 (NVIDIA GPUs)

    • Metal (Apple GPUs)

    • Vulkan (NVIDIA, AMD, Intel GPUs)

Set up build dependencies in Conda
# make sure to start with a fresh environment
conda env remove -n mlc-chat-venv
# create the conda environment with build dependency
conda create -n mlc-chat-venv -c conda-forge \
    "cmake>=3.24" \
    rust \
    git \
    python=3.11
# enter the build environment
conda activate mlc-chat-venv

Note

For runtime, TVM Unity compiler is not a dependency for MLCChat CLI or Python API. Only TVM’s runtime is required, which is automatically included in 3rdparty/tvm. However, if you would like to compile your own models, you need to follow TVM Unity.

Step 2. Configure and build. A standard git-based workflow is recommended to download MLC LLM, after which you can specify build requirements with our lightweight config generation tool:

Configure and build
# clone from GitHub
git clone --recursive https://github.com/mlc-ai/mlc-llm.git && cd mlc-llm/
# create build directory
mkdir -p build && cd build
# generate build configuration
python3 ../cmake/gen_cmake_config.py
# build mlc_llm libraries
cmake .. && cmake --build . --parallel $(nproc) && cd ..

Note

If you are using CUDA and your compute capability is above 80, then it is require to build with set(USE_FLASHINFER ON). Otherwise, you may run into Cannot find PackedFunc issue during runtime.

To check your CUDA compute capability, you can use nvidia-smi --query-gpu=compute_cap --format=csv.

Step 3. Install via Python. We recommend that you install mlc_llm as a Python package, giving you access to mlc_llm.compile, mlc_llm.ChatModule, and the CLI. There are two ways to do so:

export MLC_LLM_HOME=/path-to-mlc-llm
export PYTHONPATH=$MLC_LLM_HOME/python:$PYTHONPATH
alias mlc_llm="python -m mlc_llm"

Step 4. Validate installation. You may validate if MLC libarires and mlc_llm CLI is compiled successfully using the following command:

Validate installation
# expected to see `libmlc_llm.so` and `libtvm_runtime.so`
ls -l ./build/
# expected to see help message
mlc_llm chat -h

Finally, you can verify installation in command line. You should see the path you used to build from source with:

python -c "import mlc_llm; print(mlc_llm)"