About
Mozilla's llamafile is an open-source project designed to simplify the deployment and execution of large language models (LLMs) by packaging them into single-file executables. It combines llama.cpp — a high-performance C++ implementation for running LLMs — with Cosmopolitan Libc, enabling cross-platform compatibility without requiring installation or complex dependencies.
This llamafile was created for granite-4.0-micro, a flexible 3b model perfect for command line usage.
Installation Guide (Linux, BSD, Mac)
Download the file:
wget https://huggingface.co/Otilde/granite-4.0-micro-llamafile/resolve/main/granite-4.0-micro-Q8_0.llamafile
You can alternatively install the Q4 or Q6 version instead.
Make it executable
chmod +x granite-4.0-micro-Q8_0.llamafile
Run it!
./granite-4.0-micro-Q8_0.llamafile
Note for Windows:
After downloading the llamafile, you will need to rename the file extension to include '.exe' at the end to be able to run it. This can be done directly through the File Explorer or in the Windows Terminal.
Rename-Item -Path "granite-4.0-micro-Q8_0.llamafile" -NewName "granite-4.0-micro-Q8_0.llamafile.exe"
.\granite-4.0-micro-Q8_0.llamafile
Since the file size is <;4GB there should be no problem running it, however anything past this amount may require WSL (https://mozilla-ai.github.io/llamafile/troubleshooting/)
Examples
You can run ./granite-4.0-micro-Q8_0.llamafile --help to see all available options. Here are a few example prompts that may be useful.
ls -lh | ./granite-4.0-micro-Q8_0.llamafile --no-display-prompt --chat -p "In this large directory, please tell me the name of the earliest file I created." --nologo --interactive-first --color -e --temp 0.4 --chat-template vicuna
man vim | ./granite-4.0-micro-Q8_0.llamafile --no-display-prompt --chat -p "What are some interesting Vim command line options found in the man pages?" --nologo --interactive-first --color -e --temp 0.6 --chat-template vicuna -n -2 -c 2048
--no-display-prompt and --nologo will reduce the amount of space taken up. This presents a cleaner chat interface.
--temp 0.4, For this model I found 0.4 to work the best to reduce any potenital hallucinations for tasks that need to be as straightforward as possible. For more creativity, I go with 0.8.
--chat-template vicuna , Interestingly, I’ve found that this chat template makes Granite more coherent when given lots of context, though it’s not necessary.
--n-predict N, Sets number of tokens to predict when generating text. A higher value will result in longer text, while a lower value will produce shorter text.
-c, this will reduce or increase the context size
- If the model does not stop producing tokens you will need to manually stop it by hitting ctrl+c or ctrl+z to suspend the process.
That's it! The benefit of llamafiles is that they're incredibly quick and easy to run, but offer lots of power inside. For future llamafiles, I'll end up further customizing the .arg build to include more of these optimizations, so there's less for users to type out each time.
- Downloads last month
- 13
Model tree for Otilde/granite-4.0-micro-llamafile
Base model
ibm-granite/granite-4.0-micro