calcuis/bagel-gguf · Where to put this mode?

yiki12

May 28

where to put It?

calcuis

Owner May 28

see the updated description to run it; thanks

gisbornetv

May 28

these are the worst instructions ever

sulpher

May 28

Hi. Could you please add examples with workflows?

yiki12

May 28

It Is downloading the ema safetensor 14G 2 files how? In the repo there are only one 14G file and also I want It to download the gguf specific like q4 only how to make It do that

calcuis

Owner May 28

similar to dia, all you need to do: just execute the connector command ggc b2; nothing else right away

yiki12

May 28

No I also checked the download.py and download2.py file In "bagel2" package and b2.py file In " gguf-connector package" the repo is different It's not your rep it is this one from which It Is downloading "https://huggingface.co/callgg/bagel-fp8"

calcuis

Owner May 28

downloader.py is for bf16 and downloader2.py is for fp8; if your machine doesn't support fp8, i.e., mac, could opt to run app.py instead of app2.py

yiki12

May 28

I give up it's fine maybe 100years later when this will be so smooth like using a gguf In comfyUI, I don't blame you leave It

calcuis

Owner May 28

still cooking for the engine; flash-attn is somehow troublesome for comfyui to make it as dependency; not yet ready

yiki12

May 28

•

edited May 28

got you, but still I tried I ran the "ggc b2" command but It was downloading the files which was not in your repo this Is why I give up I don't understand how It's doing It and niether I see where It Is specified that It downloads gguf file from your repo I don't know.

calcuis

Owner May 28

the most important thing is to run the model first; test the quality; see is it really worth to pay such effort; other files will eventually work

yiki12

May 28

see
"(bagenv_py11) PS C:\Users\Yiki\Documents\Bag\BAGEL> ggc b2
pynvml (NVIDIA GPU monitoring library) initialized successfully.
Starting model loading and device map configuration...
Using max_memory_config: {0: '24GiB', 'cpu': '16GiB'}
Device map after infer_auto_device_map (with CPU budget):
language_model.model.embed_tokens: 0
language_model.model.layers.0: 0
language_model.model.layers.1: 0
language_model.model.layers.2: 0
language_model.model.layers.3: 0
......
language_model.model.norm: disk
language_model.model.norm_moe_gen: disk
language_model.model.rotary_emb: disk
language_model.lm_head: disk
time_embedder: disk
vae2llm: disk
llm2vae: disk
latent_pos_embed: disk
vit_model: disk
connector: disk
vit_pos_embed: disk
Target device for same_device_modules (based on language_model.model.embed_tokens): 0
Moving time_embedder from disk to 0 (same_device_modules)
Moving latent_pos_embed from disk to 0 (same_device_modules)
Moving vae2llm from disk to 0 (same_device_modules)
Moving llm2vae from disk to 0 (same_device_modules)
Moving connector from disk to 0 (same_device_modules)
Moving vit_pos_embed from disk to 0 (same_device_modules)
Device map after same_device_modules logic:
language_model.model.embed_tokens: 0
language_model.model.layers.0: 0
language_model.model.layers.1: 0
language_model.model.layers.2: 0
language_model.model.layers.3: 0
language_model.model.layers.4: 0
language_model.model.layers.5: 0
language_model.model.layers.6: 0
.....
language_model.model.norm: disk
language_model.model.norm_moe_gen: disk
language_model.model.rotary_emb: disk
language_model.lm_head: disk
time_embedder: 0
vae2llm: 0
llm2vae: 0
latent_pos_embed: 0
vit_model: disk
connector: 0
vit_pos_embed: 0
Manually changing the following layers from 'disk' to 'cpu': ['language_model.model.layers.19', 'language_model.model.layers.20', 'language_model.model.layers.21', 'language_model.model.layers.22', 'language_model.model.layers.23', 'language_model.model.layers.24', 'language_model.model.layers.25', 'language_model.model.layers.26', 'language_model.model.layers.27', 'language_model.model.norm', 'language_model.model.norm_moe_gen', 'language_model.model.rotary_emb', 'language_model.lm_head', 'vit_model']
Final device_map before loading checkpoint (after disk override):
language_model.model.embed_tokens: 0
language_model.model.layers.0: 0
language_model.model.layers.1: 0
language_model.model.layers.2: 0
language_model.model.layers.3: 0
language_model.model.layers.4: 0
.....
language_model.model.norm_moe_gen: cpu
language_model.model.rotary_emb: cpu
language_model.lm_head: cpu
time_embedder: 0
vae2llm: 0
llm2vae: 0
latent_pos_embed: 0
vit_model: cpu
connector: 0
vit_pos_embed: 0

Starting custom device_map modifications to maximize GPU utilization...
Device map state BEFORE custom modifications:
language_model.model.embed_tokens: 0
language_model.model.layers.0: 0
language_model.model.layers.1: 0
language_model.model.layers.2: 0
language_model.model.layers.3: 0
language_model.model.layers.4: 0
language_model.model.layers.5: 0
language_model.model.layers.6: 0
language_model.model.layers.7: 0
language_model.model.layers.8: 0
language_model.model.layers.9: 0
....
language_model.model.norm: cpu
language_model.model.norm_moe_gen: cpu
language_model.model.rotary_emb: cpu
language_model.lm_head: cpu
time_embedder: 0
vae2llm: 0
llm2vae: 0
latent_pos_embed: 0
vit_model: cpu
connector: 0
vit_pos_embed: 0

Attempting to move up to 5 LLM layers (11 to 15) to GPU 0...
Promoting LLM layer 'language_model.model.layers.11' from 'cpu' to GPU 0.
Promoting LLM layer 'language_model.model.layers.12' from 'cpu' to GPU 0.
Promoting LLM layer 'language_model.model.layers.13' from 'cpu' to GPU 0.
Promoting LLM layer 'language_model.model.layers.14' from 'cpu' to GPU 0.
Promoting LLM layer 'language_model.model.layers.15' from 'cpu' to GPU 0.
Successfully promoted 5 LLM layers to GPU 0.

Attempting to move LLM 'norm' and 'lm_head' to GPU 0 (if on CPU)...
Promoting 'language_model.model.norm' from 'cpu' to GPU 0.
Warning: Module 'language_model.model.lm_head' not found in device_map. Cannot promote.

Skipping promotion of 'vit_model' based on TRY_MOVE_VIT_MODEL_TO_GPU setting.

Final device_map after all custom modifications:
language_model.model.embed_tokens: 0
language_model.model.layers.0: 0
language_model.model.layers.1: 0
language_model.model.layers.2: 0
language_model.model.layers.3: 0
language_model.model.layers.4: 0
language_model.model.layers.5: 0
language_model.model.layers.6: 0
language_model.model.layers.7: 0
language_model.model.layers.8: 0
language_model.model.layers.9: 0
language_model.model.layers.10: 0
language_model.model.layers.11: 0
language_model.model.layers.12: 0
.....
language_model.model.norm: 0
language_model.model.norm_moe_gen: cpu
language_model.model.rotary_emb: cpu
language_model.lm_head: cpu
time_embedder: 0
vae2llm: 0
llm2vae: 0
latent_pos_embed: 0
vit_model: cpu
connector: 0
vit_pos_embed: 0
--- End of custom device_map modifications ---
The safetensors archive passed at C:\Users\Yiki\Documents\Bag\bagenv_py11\Lib\site-packages\gguf_connector\models/bagel\ema_fp8_e4m3fn.safetensors does not contain metadata. Make sure to save your model with the save_pretrained method. Defaulting to 'pt' metadata.
language_model.model.embed_tokens.weight: 0%| | 0/1223 [00:01<?, ?w/s, dev=0]"

too much to just run and test so frustrating

calcuis

Owner May 28

after loading the tokenizer config; you could see the local url, test it with the lazy webui

yiki12

May 30

•

edited May 30

see got this error

ignore about that myprint line I did that

calcuis

Owner May 30

•

edited May 30

should see the dequantizing process for gguf file; run it again with the reproduced safetensors in the same directory then it should work; upgrade your gguf-connector to the latest version; as we dropped the model gguf dequantization before a better solution come out

as you could use the pure safetensors version still with ggc b1

yiki12

May 30

I'm already using 1.7.8 gguf-connector see and not only that I tried to modify your code I thought It needed to locate that so I made this change see below

I'm still confused what do you mean by dequantizing process for gguf file, are you saying It converts from gguf to safetensor? and the vae i downloaded Is right? this one "pig_ae_fp32-f16.gguf"
and one more thing I got this error after the changes

calcuis

Owner May 30

•

edited May 30

oh, you commented out the convert_gguf_to_safetensors(input_path, vae_path, use_bf16); without this you can't get the reproduced vae for load_ae(vae_path); you don't need to modify the code unless you know how it does for rendering the data and processing your request; the original code works

yiki12

May 30

alright working but now got this

calcuis

Owner May 30

checkpoint cannot be the gguf file; please point it back to the safetensors file; just simply use the original version then it should work

if you really want to change this; you might need to break out the accelerate and flash-attn libraries to make those changes work in tensor layer level

yiki12

May 30

alright got It you want me to not download and use gguf files but safetensors, how about fp8? will It work cause I'm getting this
"(bagenv_py11) PS C:\Users\Yiki\Documents\Bag\BAGEL> ggc b2
pynvml (NVIDIA GPU monitoring library) initialized successfully!
Detecting GGUF/Safetensors...

No GGUF/Safetensors are available in the current directory.
--- Press ENTER To Exit ---"

I do have fp8 safetensor file in the same directory current one

calcuis

Owner May 30

•

edited May 30

you put both the gguf vae and the model safetensors files in the same directory and execute ggc b2; if all are safetensors files then you better use ggc b1

need both vae and model files; since they work together

yiki12

May 30

exactly did what you said see now
"

"

calcuis

Owner May 30

•

edited May 30

hey, for the model (2nd) selection, why you input 3 (pig_ae_fp32-f16-f32.safetensors) instead of 2 (ema_fp8_e4m3fn.safetensors)?

please check it carefully; troubleshot it yourself first always

yiki12

May 30

no same for that too see
"

"

calcuis

Owner May 30

•

edited May 30

use the original code; since your console shows that the dequantizing process was done twice; and we don't know what have you changed for the code and what exactly your modified version doing right away; better troubleshot it yourself; or switch it back to the original code see does it work or not

according to the error message, mostly likely those came from the vae_path you changed from the code; please check it carefully

yiki12

May 30

ye works now thanks for your replies and help, It was my bad I had modified the code and updated it works but I think 4GB vram is not enough for fp8

calcuis

Owner May 30

allocate some to system ram and cpu; can still run it with 4gb vram but a bit slow only