Qwen3-4B-Thinking-2507-LUT444-ctx4096_0.3.4-anemll
This is a CoreML model converted using ANEMLL for Apple Neural Engine inference.
Thanks for ANEMLL
Available Distributions
Standard Distribution
- Contains zipped MLMODELC files
- Suitable for macOS and development
iOS Distribution
- Contains unzipped MLMODELC files
- Ready for iOS deployment
- Includes offline tokenizer support
Model Information
- Context Length: 4096
- Embedding: LUT 4-bit
- LM Head: LUT 4-bit
- FFN and Prefill: LUT 4-bit
- Batch Size: 64
- Number of Chunks: 1
Requirements
- macOS Sequoia with Apple Neural Engine and 8GB RAM or more
- CoreML Tools and HuggingFace Transformers libraries
- Python 3.9
chat.py
provides a sample inference script.chat_full.py
provides a sample inference script with history and conversation management.
Installation
- Download the model from Hugging Face:
# Install required tools
pip install huggingface_hub
# Install Git LFS (Large File Support)
# macOS with Homebrew:
brew install git-lfs
# Initialize Git LFS
git lfs install
# Clone the repository with model files
git clone https://huggingface.co/dungnvt/Qwen3-4B-Thinking-2507-LUT444-ctx4096_0.3.4-anemll
- Extract model files:
# Navigate to cloned directory
cd Qwen3-4B-Thinking-2507-LUT444-ctx4096_0.3.4-anemll
# Pull LFS files (model weights)
git lfs pull
# Extract CoreML model files
find . -type f -name "*.zip" -exec unzip {} \;
- Install dependencies:
pip install coremltools transformers
Coremltools:
See coremltools installation guide at https://coremltools.readme.io/v4.0/docs/installation
How to Run
- Basic chat interface:
python chat.py --meta ./meta.yaml
- Full conversation mode with history:
python chat_full.py --meta ./meta.yaml
Note: The first time the model loads, macOS will take some time to place it on the device. Subsequent loads will be instantaneous. Use Ctrl-D to exit, Ctrl-C to interrupt inference.
Quick Start
Test in iOS/macOS App
Try our sample Chat-Bot app on TestFlight:
- Install TestFlight from App Store
- Join beta test: TestFlight Link
- App includes a small demo model pre-installed
- You can add custom models via HuggingFace URLs
- The TestFlight app works on both iOS and macOS
- Demonstrates proper model integration and provides a reference implementation
- iOS requires unzipped MLMODELC files and config.json for offline tokenizer
- macOS supports both zipped and unzipped model formats
- Downloads last month
- 35
Model tree for dungnvt/Qwen3-4B-Thinking-2507-LUT444-ctx4096_0.3.4-anemll
Base model
Qwen/Qwen3-4B-Thinking-2507