Update README.md
Browse files
README.md
CHANGED
|
@@ -11,8 +11,10 @@ base_model: black-forest-labs/FLUX.1-dev
|
|
| 11 |
|
| 12 |
A collection of GGUF models using mixed quantization (different layers quantized to different precision to optimise fidelity v. memory).
|
| 13 |
|
| 14 |
-
They
|
| 15 |
-
|
|
|
|
|
|
|
| 16 |
|
| 17 |
## Naming convention (mx for 'mixed')
|
| 18 |
|
|
@@ -30,11 +32,13 @@ where NN_N is the approximate reduction in VRAM usage compared the full 16 bit v
|
|
| 30 |
|
| 31 |
The process for optimisation is as follows:
|
| 32 |
|
| 33 |
-
- 240 prompts used for flux images popular at civit.ai were run through the full Flux.1-dev model
|
| 34 |
-
-
|
| 35 |
-
-
|
| 36 |
-
-
|
| 37 |
-
-
|
|
|
|
|
|
|
| 38 |
- An optimised quantization is one that gives the desired reduction in size for the smallest total cost
|
| 39 |
- A series of recipies for optimization have been created from the calculated costs
|
| 40 |
- the various 'in' blocks, the final layer blocks, and all normalization scale parameters are stored in float32
|
|
|
|
| 11 |
|
| 12 |
A collection of GGUF models using mixed quantization (different layers quantized to different precision to optimise fidelity v. memory).
|
| 13 |
|
| 14 |
+
They were created using the [convert.py script](https://github.com/chrisgoringe/mixed-gguf-converter).
|
| 15 |
+
|
| 16 |
+
They can be loaded in ComfyUI using the [ComfyUI GGUF Nodes](https://github.com/city96/ComfyUI-GGUF). Just put the gguf files in your
|
| 17 |
+
models/unet directory.
|
| 18 |
|
| 19 |
## Naming convention (mx for 'mixed')
|
| 20 |
|
|
|
|
| 32 |
|
| 33 |
The process for optimisation is as follows:
|
| 34 |
|
| 35 |
+
- 240 prompts used for flux images popular at civit.ai were run through the full Flux.1-dev model with randomised resolution and step count.
|
| 36 |
+
- For a randomly selected step in the inference, the hidden states before and after the layer stack were captured.
|
| 37 |
+
- For each layer in turn, and for each of the Q8_0, Q5_1 and Q4_1 quantizations:
|
| 38 |
+
- A single layer was quantized
|
| 39 |
+
- The initial hidden states were processed by the modified layer stack
|
| 40 |
+
- The error (MSE) in the final hidden state was calculated
|
| 41 |
+
- This gives a 'cost' for each possible layer quantization
|
| 42 |
- An optimised quantization is one that gives the desired reduction in size for the smallest total cost
|
| 43 |
- A series of recipies for optimization have been created from the calculated costs
|
| 44 |
- the various 'in' blocks, the final layer blocks, and all normalization scale parameters are stored in float32
|