LIMI-Air-qx86-hi-mlx
I thought I'd apply the Deckard Formula(qx), a combination of layers in different precisions and strengths, that has the effect of increasing the model creativity.
The GLM Air is known for its lack of sense of humour, so I thought, that would be the best fit
I created this formula inspired by the Nikon Noct Z 58mm F/0.95, that is remarkable for its abilities to render the scene with a human-like feeling, provides a thin depth of field to isolate the subject from the background, and pleasantly blurs out the rest creating memorable images.
As a photographer I know how lenses work, I applied the same principles to cognition.
Here are some ideas how this formula performs, from initial tests on a q5 quant series
The q5-hi quant
The q5-hi quant was tested with group size 32 at quanting, meaning high precision.
Given a standard task, the LIMI-Air-q5-hi was done quick.
Thinks for 42s, drafts a few things in a hurry
29.96 tok/sec
4002 tokens
4.74s to first token
Very basic output, boring.
The qx5 quant
In the LIMI-Air-qx5 quant, I use 5 bits for content and context in high precision, 6 bits for attention and 8 bits for head
Thinks for 2min 28s, more engaged in the discovery process in the think tag
29.22 tok/sec
7067 tokens
6.92s to first token
The qx5 provided a bit more detail, seems happier.
The qx65-hi quant
In the LIMI-Air-qx65-hi quant, I use 5 bits for content, 6 bits for attention and context, 8 bits for head, all in high precision
Thinks for 5min 30s:
Given the complexity, we might not be able to complete the entire code in one response. We'll provide a skeleton for each module and then fill in key functions. /think
...writes quite complete code, tests and all.
25.21 tok/sec
15111 tokens
7.04s to first token
Summary of Deckard Formula performance
The qx65-hi delivered twice as many tokens in form of very solid reasoning in the think tag, and a competent assembly of software as requested, and in the limits of a single response. Subsequent queries revealed the model being very cooperative and enjoying the process of debugging and refinement.
The qx86-hi is the same formula, one step up from 5 to 6 bit for content, 8 bit otherwise, all in high precision
This quant formula makes the model happier and more eager to explore and innovate
-G
This model LIMI-Air-qx86-hi-mlx was converted to MLX format from GAIR/LIMI-Air using mlx-lm version 0.27.1.
Use with mlx
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("LIMI-Air-qx86-hi-mlx")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
- Downloads last month
- 73
Model tree for nightmedia/LIMI-Air-qx86-hi-mlx
Base model
GAIR/LIMI-Air