Android SDK

Demo App

The demo APK below is built for Samsung S23 with Snapdragon 8 Gen 2 chip.

https://seeklogo.com/images/D/download-android-apk-badge-logo-D074C6882B-seeklogo.com.png

Prerequisite

Rust (install) is needed to cross-compile HuggingFace tokenizers to Android. Make sure rustc, cargo, and rustup are available in $PATH.

Android Studio (install) with NDK and CMake. To install NDK and CMake, in the Android Studio welcome page, click “Projects → SDK Manager → SDK Tools”. Set up the following environment variables:

  • ANDROID_NDK so that $ANDROID_NDK/build/cmake/android.toolchain.cmake is available.

  • TVM_NDK_CC that points to NDK’s clang compiler.

# Example on macOS
ANDROID_NDK: $HOME/Library/Android/sdk/ndk/25.2.9519653
TVM_NDK_CC: $ANDROID_NDK/toolchains/llvm/prebuilt/darwin-x86_64/bin/aarch64-linux-android24-clang
# Example on Windows
ANDROID_NDK: $HOME/Library/Android/sdk/ndk/25.2.9519653
TVM_NDK_CC: $ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android24-clang

JDK, such as OpenJDK >= 17, to compile Java bindings of TVM Unity runtime. We strongly recommend setting the JAVA_HOME to the JDK bundled with Android Studio. e.g. export JAVA_HOME=/Applications/Android\ Studio.app/Contents/jbr/Contents/Home for macOS. Using Android Studio’s JBR bundle as recommended here https://developer.android.com/build/jdks will reduce the chances of potential errors in JNI compilation. Set up the following environment variable:

  • export JAVA_HOME=/path/to/java_home you can then cross check and make sure $JAVA_HOME/bin/java exists.

Please ensure that the JDK versions for Android Studio and JAVA_HOME are the same.

TVM Unity runtime is placed under 3rdparty/tvm in MLC LLM, so there is no need to install anything extra. Set up the following environment variable:

  • export TVM_SOURCE_DIR=/path/to/mlc-llm/3rdparty/tvm.

(Optional) TVM Unity compiler Python package (install or build from source). It is NOT required if models are prebuilt, but to compile PyTorch models from HuggingFace in the following section, the compiler is a must-dependency.

Note

❗ Whenever using Python, it is highly recommended to use conda to manage an isolated Python environment to avoid missing dependencies, incompatible versions, and package conflicts.

Check if environment variable are properly set as the last check. One way to ensure this is to place them in $HOME/.zshrc, $HOME/.bashrc or environment management tools.

source $HOME/.cargo/env # Rust
export ANDROID_NDK=...  # Android NDK toolchain
export TVM_NDK_CC=...   # Android NDK clang
export JAVA_HOME=...    # Java
export TVM_SOURCE_DIR=...     # TVM Unity runtime

Build Android App from Source

This section shows how we can build the app from the source.

Step 1. Install Build Dependencies

First and foremost, please clone the MLC LLM GitHub repository. After cloning, go to the android/ directory.

git clone https://github.com/mlc-ai/mlc-llm.git
cd mlc-llm
git submodule update --init --recursive
cd android

Step 2. Build Runtime and Model Libraries

The models to be built for the Android app are specified in MLCChat/mlc-package-config.json: in the model_list, model points to the Hugging Face repository which

  • model points to the Hugging Face repository which contains the pre-converted model weights. The Android app will download model weights from the Hugging Face URL.

  • model_id is a unique model identifier.

  • estimated_vram_bytes is an estimation of the vRAM the model takes at runtime.

  • "bundle_weight": true means the model weights of the model will be bundled into the app when building.

  • overrides specifies some model config parameter overrides.

We have a one-line command to build and prepare all the model libraries:

cd /path/to/MLCChat  # e.g., "android/MLCChat"
export MLC_LLM_SOURCE_DIR=/path/to/mlc-llm  # e.g., "../.."
mlc_llm package

This command mainly executes the following two steps:

  1. Compile models. We compile each model in model_list of MLCChat/mlc-package-config.json into a binary model library.

  2. Build runtime and tokenizer. In addition to the model itself, a lightweight runtime and tokenizer are required to actually run the LLM.

The command creates a ./dist/ directory that contains the runtime and model build output. Please make sure all the following files exist in ./dist/.

dist
└── lib
    └── mlc4j
        ├── build.gradle
        ├── output
        │   ├── arm64-v8a
        │   │   └── libtvm4j_runtime_packed.so
        │   └── tvm4j_core.jar
        └── src
            ├── cpp
            │   └── tvm_runtime.h
            └── main
                ├── AndroidManifest.xml
                ├── assets
                │   └── mlc-app-config.json
                └── java
                    └── ...

The model execution logic in mobile GPUs is incorporated into libtvm4j_runtime_packed.so, while tvm4j_core.jar is a lightweight (~60 kb) Java binding to it. dist/lib/mlc4j is a gradle subproject that you should include in your app so the Android project can reference the mlc4j (MLC LLM java library). This library packages the dependent model libraries and necessary runtime to execute the model.

include ':mlc4j'
project(':mlc4j').projectDir = file('dist/lib/mlc4j')

Note

We leverage a local JIT cache to avoid repetitive compilation of the same input. However, sometimes it is helpful to force rebuild when we have a new compiler update or when something goes wrong with the ached library. You can do so by setting the environment variable MLC_JIT_POLICY=REDO

MLC_JIT_POLICY=REDO mlc_llm package

Step 3. Build Android App

Open folder ./android as an Android Studio Project. Connect your Android device to your machine. In the menu bar of Android Studio, click “Build → Make Project”. Once the build is finished, click “Run → Run ‘app’” and you will see the app launched on your phone.

Note

❗ This app cannot be run in an emulator and thus a physical phone is required, because MLC LLM needs an actual mobile GPU to meaningfully run at an accelerated speed.

Customize the App

We can customize the models built in the Android app by customizing MLCChat/mlc-package-config.json. We introduce each field of the JSON file here.

Each entry in "model_list" of the JSON file has the following fields:

model

(Required) The path to the MLC-converted model to be built into the app. It is a Hugging Face URL (e.g., "model": "HF://mlc-ai/phi-2-q4f16_1-MLC"`) that contains the pre-converted model weights.

model_id

(Required) A unique local identifier to identify the model. It can be an arbitrary one.

estimated_vram_bytes

(Required) Estimated requirements of vRAM to run the model.

bundle_weight

(Optional) A boolean flag indicating whether to bundle model weights into the app. See Bundle Model Weights below.

overrides

(Optional) A dictionary to override the default model context window size (to limit the KV cache size) and prefill chunk size (to limit the model temporary execution memory). Example:

{
   "device": "android",
   "model_list": [
      {
            "model": "HF://mlc-ai/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC",
            "model_id": "RedPajama-INCITE-Chat-3B-v1-q4f16_1",
            "estimated_vram_bytes": 1948348579,
            "overrides": {
               "context_window_size": 512,
               "prefill_chunk_size": 128
            }
      }
   ]
}
model_lib

(Optional) A string specifying the system library prefix to use for the model. Usually this is used when you want to build multiple model variants with the same architecture into the app. This field does not affect any app functionality. The "model_lib_path_for_prepare_libs" introduced below is also related. Example:

{
   "device": "android",
   "model_list": [
      {
            "model": "HF://mlc-ai/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC",
            "model_id": "RedPajama-INCITE-Chat-3B-v1-q4f16_1",
            "estimated_vram_bytes": 1948348579,
            "model_lib": "gpt_neox_q4f16_1"
      }
   ]
}

Besides model_list in MLCChat/mlc-package-config.json, you can also optionally specify a dictionary of "model_lib_path_for_prepare_libs", if you want to use model libraries that are manually compiled. The keys of this dictionary should be the model_lib that specified in model list, and the values of this dictionary are the paths (absolute, or relative) to the manually compiled model libraries. The model libraries specified in "model_lib_path_for_prepare_libs" will be built into the app when running mlc_llm package. Example:

{
   "device": "android",
   "model_list": [
      {
            "model": "HF://mlc-ai/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC",
            "model_id": "RedPajama-INCITE-Chat-3B-v1-q4f16_1",
            "estimated_vram_bytes": 1948348579,
            "model_lib": "gpt_neox_q4f16_1"
      }
   ],
   "model_lib_path_for_prepare_libs": {
      "gpt_neox_q4f16_1": "../../dist/lib/RedPajama-INCITE-Chat-3B-v1-q4f16_1-android.tar"
   }
}

Bundle Model Weights

Instructions have been provided to build an Android App with MLC LLM in previous sections, but it requires run-time weight downloading from HuggingFace, as configured in MLCChat/mlc-package-config.json. However, it could be desirable to bundle weights together into the app to avoid downloading over the network. In this section, we provide a simple ADB-based walkthrough that hopefully helps with further development.

Enable weight bundle. Set the field "bundle_weight": true for any model you want to bundle weights in MLCChat/mlc-package-config.json, and run mlc_llm package again. Below is an example:

{
   "device": "android",
   "model_list": [
      {
         "model": "HF://mlc-ai/gemma-2b-it-q4f16_1-MLC",
         "model_id": "gemma-2b-q4f16_1",
         "estimated_vram_bytes": 3000000000,
         "bundle_weight": true
      }
   ]
}

The outcome of running mlc_llm package should be as follows:

dist
├── bundle
│   ├── gemma-2b-q4f16_1   # The model weights that will be bundled into the app.
│   └── mlc-app-config.json
└── ...

Generating APK. Enter Android Studio, and click “Build → Generate Signed Bundle/APK” to build an APK for release. If it is the first time you generate an APK, you will need to create a key according to the official guide from Android. This APK will be placed under android/MLCChat/app/release/app-release.apk.

Install ADB and USB debugging. Enable “USB debugging” in the developer mode in your phone settings. In “SDK manager - SDK Tools”, install Android SDK Platform-Tools. Add the path to platform-tool path to the environment variable PATH (on macOS, it is $HOME/Library/Android/sdk/platform-tools). Run the following commands, and if ADB is installed correctly, your phone will appear as a device:

adb devices

Install the APK and weights to your phone. Run the commands below to install the app, and push the local weights to the app data directory on your device. Once it finishes, you can start the MLCChat app on your device. The models with bundle_weight set to true will have their weights already on device.

cd /path/to/MLCChat  # e.g., "android/MLCChat"
python bundle_weight.py --apk-path app/release/app-release.apk