LIMI-Air-qx86-hi-mlx

I thought I'd apply the Deckard Formula(qx), a combination of layers in different precisions and strengths, that has the effect of increasing the model creativity.

The GLM Air is known for its lack of sense of humour, so I thought, that would be the best fit

I created this formula inspired by the Nikon Noct Z 58mm F/0.95, that is remarkable for its abilities to render the scene with a human-like feeling, provides a thin depth of field to isolate the subject from the background, and pleasantly blurs out the rest creating memorable images.

As a photographer I know how lenses work, I applied the same principles to cognition.

Here are some ideas how this formula performs, from initial tests on a q5 quant series

The q5-hi quant

The q5-hi quant was tested with group size 32 at quanting, meaning high precision.

Given a standard task, the LIMI-Air-q5-hi was done quick.

Thinks for 42s, drafts a few things in a hurry

29.96 tok/sec
4002 tokens
4.74s to first token

Very basic output, boring.

The qx5 quant

In the LIMI-Air-qx5 quant, I use 5 bits for content and context in high precision, 6 bits for attention and 8 bits for head

Thinks for 2min 28s, more engaged in the discovery process in the think tag

29.22 tok/sec
7067 tokens
6.92s to first token

The qx5 provided a bit more detail, seems happier.

The qx65-hi quant

In the LIMI-Air-qx65-hi quant, I use 5 bits for content, 6 bits for attention and context, 8 bits for head, all in high precision

Thinks for 5min 30s:

Given the complexity, we might not be able to complete the entire code in one response. We'll provide a skeleton for each module and then fill in key functions. /think

...writes quite complete code, tests and all.

25.21 tok/sec
15111 tokens
7.04s to first token

Summary of Deckard Formula performance

The qx65-hi delivered twice as many tokens in form of very solid reasoning in the think tag, and a competent assembly of software as requested, and in the limits of a single response. Subsequent queries revealed the model being very cooperative and enjoying the process of debugging and refinement.

The qx86-hi is the same formula, one step up from 5 to 6 bit for content, 8 bit otherwise, all in high precision

This quant formula makes the model happier and more eager to explore and innovate

-G

This model LIMI-Air-qx86-hi-mlx was converted to MLX format from GAIR/LIMI-Air using mlx-lm version 0.27.1.

Use with mlx

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("LIMI-Air-qx86-hi-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
Downloads last month
73
Safetensors
Model size
107B params
Tensor type
BF16
U32
F32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for nightmedia/LIMI-Air-qx86-hi-mlx

Base model

GAIR/LIMI-Air
Quantized
(4)
this model