Qwen3-4B-Thinking-2507-LUT444-ctx4096_0.3.4-anemll

This is a CoreML model converted using ANEMLL for Apple Neural Engine inference.

Thanks for ANEMLL

Available Distributions

Standard Distribution

Contains zipped MLMODELC files
Suitable for macOS and development

iOS Distribution

Contains unzipped MLMODELC files
Ready for iOS deployment
Includes offline tokenizer support

Model Information

Context Length: 4096
Embedding: LUT 4-bit
LM Head: LUT 4-bit
FFN and Prefill: LUT 4-bit
Batch Size: 64
Number of Chunks: 1

Requirements

macOS Sequoia with Apple Neural Engine and 8GB RAM or more
CoreML Tools and HuggingFace Transformers libraries
Python 3.9

chat.py provides a sample inference script.
chat_full.py provides a sample inference script with history and conversation management.

Installation

Download the model from Hugging Face:

# Install required tools
pip install huggingface_hub

# Install Git LFS (Large File Support)
# macOS with Homebrew:
brew install git-lfs

# Initialize Git LFS
git lfs install

# Clone the repository with model files
git clone https://huggingface.co/dungnvt/Qwen3-4B-Thinking-2507-LUT444-ctx4096_0.3.4-anemll

Extract model files:

# Navigate to cloned directory
cd Qwen3-4B-Thinking-2507-LUT444-ctx4096_0.3.4-anemll

# Pull LFS files (model weights)
git lfs pull

# Extract CoreML model files
find . -type f -name "*.zip" -exec unzip {} \;

Install dependencies:

pip install coremltools transformers

Coremltools:

See coremltools installation guide at https://coremltools.readme.io/v4.0/docs/installation

How to Run

Basic chat interface:

python chat.py --meta ./meta.yaml

Full conversation mode with history:

python chat_full.py --meta ./meta.yaml

Note: The first time the model loads, macOS will take some time to place it on the device. Subsequent loads will be instantaneous. Use Ctrl-D to exit, Ctrl-C to interrupt inference.

Quick Start

Test in iOS/macOS App

Try our sample Chat-Bot app on TestFlight:

Install TestFlight from App Store
Join beta test: TestFlight Link
App includes a small demo model pre-installed
You can add custom models via HuggingFace URLs

The TestFlight app works on both iOS and macOS

Demonstrates proper model integration and provides a reference implementation

iOS requires unzipped MLMODELC files and config.json for offline tokenizer

macOS supports both zipped and unzipped model formats

Downloads last month: 35

Model tree for dungnvt/Qwen3-4B-Thinking-2507-LUT444-ctx4096_0.3.4-anemll

Base model

Qwen/Qwen3-4B-Thinking-2507

Finetuned

(51)

this model