Package Libraries and Weights¶
When we want to build LLM applications with MLC LLM (e.g., iOS/Android apps),
usually we need to build static model libraries and app binding libraries,
and sometimes bundle model weights into the app.
MLC LLM provides a tool for fast model library and weight packaging: mlc_llm package.
This page briefly introduces how to use mlc_llm package for packaging.
Tutorials iOS Swift SDK and Android SDK contain detailed examples and instructions
on using this packaging tool for iOS and Android deployment.
Introduction¶
To use mlc_llm package, we must clone the source code of MLC LLM
and install the MLC LLM and TVM package.
Depending on the app we build, there might be some other dependencies, which are described in
corresponding iOS and Android tutorials.
After cloning, the basic usage of mlc_llm package is as the following.
export MLC_LLM_SOURCE_DIR=/path/to/mlc-llm
cd /path/to/app # The app root directory which contains "mlc-package-config.json".
# E.g., "ios/MLCChat" or "android/MLCChat"
mlc_llm package
The package command reads from the JSON file mlc-package-config.json under the current directory.
The output of this command is a directory dist/,
which contains the packaged model libraries (under dist/lib/) and weights (under dist/bundle/).
This directory contains all necessary data for the app build.
Depending on the app we build, the internal structure of dist/lib/ may be different.
dist
├── lib
│ └── ...
└── bundle
└── ...
The input mlc-package-config.json file specifies
the device (e.g., iPhone or Android) to package model libraries and weights for,
the list of models to package.
Below is an example mlc-package-config.json file:
{
"device": "iphone",
"model_list": [
{
"model": "HF://mlc-ai/Mistral-7B-Instruct-v0.2-q3f16_1-MLC",
"model_id": "Mistral-7B-Instruct-v0.2-q3f16_1",
"estimated_vram_bytes": 3316000000,
"bundle_weight": true,
"overrides": {
"context_window_size": 512
}
},
{
"model": "HF://mlc-ai/gemma-2b-it-q4f16_1-MLC",
"model_id": "gemma-2b-q4f16_1",
"estimated_vram_bytes": 3000000000,
"overrides": {
"prefill_chunk_size": 128
}
}
]
}
This example mlc-package-config.json specifies “iphone” as the target device.
In the model_list,
modelpoints to the Hugging Face repository which contains the pre-converted model weights. Apps will download model weights from the Hugging Face URL.model_idis a unique model identifier.estimated_vram_bytesis an estimation of the vRAM the model takes at runtime."bundle_weight": truemeans the model weights of the model will be bundled into the app when building.overridesspecifies some model config parameter overrides.
Below is a more detailed specification of the mlc-package-config.json file.
Each entry in "model_list" of the JSON file has the following fields:
model(Required) The path to the MLC-converted model to be built into the app.
Usually it is a Hugging Face URL (e.g.,
"model": "HF://mlc-ai/phi-2-q4f16_1-MLC"`) that contains the pre-converted model weights. For iOS, it can also be a path to a local model directory which contains converted model weights (e.g.,"model": "../dist/gemma-2b-q4f16_1"). Please check out Convert Model Weights if you want to build local model into the app.model_id(Required) A unique local identifier to identify the model. It can be an arbitrary one.
estimated_vram_bytes(Required) Estimated requirements of vRAM to run the model.
bundle_weight(Optional) A boolean flag indicating whether to bundle model weights into the app. If this field is set to true, the
mlc_llm packagecommand will copy the model weights todist/bundle/$model_id.overrides(Optional) A dictionary to override the default model context window size (to limit the KV cache size) and prefill chunk size (to limit the model temporary execution memory). Example:
{ "device": "iphone", "model_list": [ { "model": "HF://mlc-ai/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC", "model_id": "RedPajama-INCITE-Chat-3B-v1-q4f16_1", "estimated_vram_bytes": 2960000000, "overrides": { "context_window_size": 512, "prefill_chunk_size": 128 } } ] }
model_lib(Optional) A string specifying the system library prefix to use for the model. Usually this is used when you want to build multiple model variants with the same architecture into the app. This field does not affect any app functionality. The
"model_lib_path_for_prepare_libs"introduced below is also related. Example:{ "device": "iphone", "model_list": [ { "model": "HF://mlc-ai/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC", "model_id": "RedPajama-INCITE-Chat-3B-v1-q4f16_1", "estimated_vram_bytes": 2960000000, "model_lib": "gpt_neox_q4f16_1" } ] }
Besides model_list in MLCChat/mlc-package-config.json,
you can also optionally specify a dictionary of "model_lib_path_for_prepare_libs",
if you want to use model libraries that are manually compiled.
The keys of this dictionary should be the model_lib that specified in model list,
and the values of this dictionary are the paths (absolute, or relative) to the manually compiled model libraries.
The model libraries specified in "model_lib_path_for_prepare_libs" will be built into the app when running mlc_llm package.
Example:
{
"device": "iphone",
"model_list": [
{
"model": "HF://mlc-ai/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC",
"model_id": "RedPajama-INCITE-Chat-3B-v1-q4f16_1",
"estimated_vram_bytes": 2960000000,
"model_lib": "gpt_neox_q4f16_1"
}
],
"model_lib_path_for_prepare_libs": {
"gpt_neox_q4f16_1": "../../dist/lib/RedPajama-INCITE-Chat-3B-v1-q4f16_1-iphone.tar"
}
}
Compilation Cache¶
mlc_llm package leverage a local JIT cache to avoid repetitive compilation of the same input.
It also leverages a local cache to download weights from remote. These caches
are shared across the entire project. Sometimes it is helpful to force rebuild when
we have a new compiler update or when something goes wrong with the cached library.
You can do so by setting the environment variable MLC_JIT_POLICY=REDO
MLC_JIT_POLICY=REDO mlc_llm package
Arguments of mlc_llm package¶
Command mlc_llm package can optionally take the arguments below:
--package-configA path to
mlc-package-config.jsonwhich contains the device and model specification. By default, it is themlc-package-config.jsonunder the current directory.--mlc-llm-source-dirThe path to MLC LLM source code (cloned from https://github.com/mlc-ai/mlc-llm). By default, it is the
$MLC_LLM_SOURCE_DIRenvironment variable. If neither$MLC_LLM_SOURCE_DIRor--mlc-llm-source-diris specified, error will be reported.--output/-oThe output directory of
mlc_llm packagecommand. By default, it isdist/under the current directory.
Summary and What to Do Next¶
In this page, we introduced the mlc_llm package command for fast model library and weight packaging.
It takes input file
mlc-package-config.jsonwhich contains the device and model specification for packaging.It outputs directory
dist/, which contains packaged libraries underdist/lib/and model weights underdist/bundle/.
Next, please feel free to check out the iOS and Android tutorials for detailed examples of using mlc_llm package.