Qwen3-4B-Thinking-2507-LUT444-ctx4096_0.3.4-anemll

This is a CoreML model converted using ANEMLL for Apple Neural Engine inference.

Thanks for ANEMLL

Available Distributions

Standard Distribution

  • Contains zipped MLMODELC files
  • Suitable for macOS and development

iOS Distribution

  • Contains unzipped MLMODELC files
  • Ready for iOS deployment
  • Includes offline tokenizer support

Model Information

  • Context Length: 4096
  • Embedding: LUT 4-bit
  • LM Head: LUT 4-bit
  • FFN and Prefill: LUT 4-bit
  • Batch Size: 64
  • Number of Chunks: 1

Requirements

  • macOS Sequoia with Apple Neural Engine and 8GB RAM or more
  • CoreML Tools and HuggingFace Transformers libraries
  • Python 3.9

chat.py provides a sample inference script.
chat_full.py provides a sample inference script with history and conversation management.

Installation

  1. Download the model from Hugging Face:
# Install required tools
pip install huggingface_hub

# Install Git LFS (Large File Support)
# macOS with Homebrew:
brew install git-lfs

# Initialize Git LFS
git lfs install

# Clone the repository with model files
git clone https://huggingface.co/dungnvt/Qwen3-4B-Thinking-2507-LUT444-ctx4096_0.3.4-anemll
  1. Extract model files:
# Navigate to cloned directory
cd Qwen3-4B-Thinking-2507-LUT444-ctx4096_0.3.4-anemll

# Pull LFS files (model weights)
git lfs pull

# Extract CoreML model files
find . -type f -name "*.zip" -exec unzip {} \;
  1. Install dependencies:
pip install coremltools transformers

Coremltools:

See coremltools installation guide at https://coremltools.readme.io/v4.0/docs/installation

How to Run

  1. Basic chat interface:
python chat.py --meta ./meta.yaml
  1. Full conversation mode with history:
python chat_full.py --meta ./meta.yaml

Note: The first time the model loads, macOS will take some time to place it on the device. Subsequent loads will be instantaneous. Use Ctrl-D to exit, Ctrl-C to interrupt inference.

Quick Start

Test in iOS/macOS App

Try our sample Chat-Bot app on TestFlight:

  1. Install TestFlight from App Store
  2. Join beta test: TestFlight Link
  3. App includes a small demo model pre-installed
  4. You can add custom models via HuggingFace URLs

  • The TestFlight app works on both iOS and macOS
  • Demonstrates proper model integration and provides a reference implementation
  • iOS requires unzipped MLMODELC files and config.json for offline tokenizer
  • macOS supports both zipped and unzipped model formats
Downloads last month
35
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for dungnvt/Qwen3-4B-Thinking-2507-LUT444-ctx4096_0.3.4-anemll

Finetuned
(51)
this model