Where to put this mode?
where to put It?
see the updated description to run it; thanks
these are the worst instructions ever
Hi. Could you please add examples with workflows?
It Is downloading the ema safetensor 14G 2 files how? In the repo there are only one 14G file and also I want It to download the gguf specific like q4 only how to make It do that
No I also checked the download.py and download2.py file In "bagel2" package and b2.py file In " gguf-connector package" the repo is different It's not your rep it is this one from which It Is downloading "https://huggingface.co/callgg/bagel-fp8"
downloader.py is for bf16 and downloader2.py is for fp8; if your machine doesn't support fp8, i.e., mac, could opt to run app.py instead of app2.py
I give up it's fine maybe 100years later when this will be so smooth like using a gguf In comfyUI, I don't blame you leave It
still cooking for the engine; flash-attn is somehow troublesome for comfyui to make it as dependency; not yet ready
got you, but still I tried I ran the "ggc b2" command but It was downloading the files which was not in your repo this Is why I give up I don't understand how It's doing It and niether I see where It Is specified that It downloads gguf file from your repo I don't know.
the most important thing is to run the model first; test the quality; see is it really worth to pay such effort; other files will eventually work
see
"(bagenv_py11) PS C:\Users\Yiki\Documents\Bag\BAGEL> ggc b2
pynvml (NVIDIA GPU monitoring library) initialized successfully.
Starting model loading and device map configuration...
Using max_memory_config: {0: '24GiB', 'cpu': '16GiB'}
Device map after infer_auto_device_map (with CPU budget):
language_model.model.embed_tokens: 0
language_model.model.layers.0: 0
language_model.model.layers.1: 0
language_model.model.layers.2: 0
language_model.model.layers.3: 0
......
language_model.model.norm: disk
language_model.model.norm_moe_gen: disk
language_model.model.rotary_emb: disk
language_model.lm_head: disk
time_embedder: disk
vae2llm: disk
llm2vae: disk
latent_pos_embed: disk
vit_model: disk
connector: disk
vit_pos_embed: disk
Target device for same_device_modules (based on language_model.model.embed_tokens): 0
Moving time_embedder from disk to 0 (same_device_modules)
Moving latent_pos_embed from disk to 0 (same_device_modules)
Moving vae2llm from disk to 0 (same_device_modules)
Moving llm2vae from disk to 0 (same_device_modules)
Moving connector from disk to 0 (same_device_modules)
Moving vit_pos_embed from disk to 0 (same_device_modules)
Device map after same_device_modules logic:
language_model.model.embed_tokens: 0
language_model.model.layers.0: 0
language_model.model.layers.1: 0
language_model.model.layers.2: 0
language_model.model.layers.3: 0
language_model.model.layers.4: 0
language_model.model.layers.5: 0
language_model.model.layers.6: 0
.....
language_model.model.norm: disk
language_model.model.norm_moe_gen: disk
language_model.model.rotary_emb: disk
language_model.lm_head: disk
time_embedder: 0
vae2llm: 0
llm2vae: 0
latent_pos_embed: 0
vit_model: disk
connector: 0
vit_pos_embed: 0
Manually changing the following layers from 'disk' to 'cpu': ['language_model.model.layers.19', 'language_model.model.layers.20', 'language_model.model.layers.21', 'language_model.model.layers.22', 'language_model.model.layers.23', 'language_model.model.layers.24', 'language_model.model.layers.25', 'language_model.model.layers.26', 'language_model.model.layers.27', 'language_model.model.norm', 'language_model.model.norm_moe_gen', 'language_model.model.rotary_emb', 'language_model.lm_head', 'vit_model']
Final device_map before loading checkpoint (after disk override):
language_model.model.embed_tokens: 0
language_model.model.layers.0: 0
language_model.model.layers.1: 0
language_model.model.layers.2: 0
language_model.model.layers.3: 0
language_model.model.layers.4: 0
.....
language_model.model.norm_moe_gen: cpu
language_model.model.rotary_emb: cpu
language_model.lm_head: cpu
time_embedder: 0
vae2llm: 0
llm2vae: 0
latent_pos_embed: 0
vit_model: cpu
connector: 0
vit_pos_embed: 0
Starting custom device_map modifications to maximize GPU utilization...
Device map state BEFORE custom modifications:
language_model.model.embed_tokens: 0
language_model.model.layers.0: 0
language_model.model.layers.1: 0
language_model.model.layers.2: 0
language_model.model.layers.3: 0
language_model.model.layers.4: 0
language_model.model.layers.5: 0
language_model.model.layers.6: 0
language_model.model.layers.7: 0
language_model.model.layers.8: 0
language_model.model.layers.9: 0
....
language_model.model.norm: cpu
language_model.model.norm_moe_gen: cpu
language_model.model.rotary_emb: cpu
language_model.lm_head: cpu
time_embedder: 0
vae2llm: 0
llm2vae: 0
latent_pos_embed: 0
vit_model: cpu
connector: 0
vit_pos_embed: 0
Attempting to move up to 5 LLM layers (11 to 15) to GPU 0...
Promoting LLM layer 'language_model.model.layers.11' from 'cpu' to GPU 0.
Promoting LLM layer 'language_model.model.layers.12' from 'cpu' to GPU 0.
Promoting LLM layer 'language_model.model.layers.13' from 'cpu' to GPU 0.
Promoting LLM layer 'language_model.model.layers.14' from 'cpu' to GPU 0.
Promoting LLM layer 'language_model.model.layers.15' from 'cpu' to GPU 0.
Successfully promoted 5 LLM layers to GPU 0.
Attempting to move LLM 'norm' and 'lm_head' to GPU 0 (if on CPU)...
Promoting 'language_model.model.norm' from 'cpu' to GPU 0.
Warning: Module 'language_model.model.lm_head' not found in device_map. Cannot promote.
Skipping promotion of 'vit_model' based on TRY_MOVE_VIT_MODEL_TO_GPU setting.
Final device_map after all custom modifications:
language_model.model.embed_tokens: 0
language_model.model.layers.0: 0
language_model.model.layers.1: 0
language_model.model.layers.2: 0
language_model.model.layers.3: 0
language_model.model.layers.4: 0
language_model.model.layers.5: 0
language_model.model.layers.6: 0
language_model.model.layers.7: 0
language_model.model.layers.8: 0
language_model.model.layers.9: 0
language_model.model.layers.10: 0
language_model.model.layers.11: 0
language_model.model.layers.12: 0
.....
language_model.model.norm: 0
language_model.model.norm_moe_gen: cpu
language_model.model.rotary_emb: cpu
language_model.lm_head: cpu
time_embedder: 0
vae2llm: 0
llm2vae: 0
latent_pos_embed: 0
vit_model: cpu
connector: 0
vit_pos_embed: 0
--- End of custom device_map modifications ---
The safetensors archive passed at C:\Users\Yiki\Documents\Bag\bagenv_py11\Lib\site-packages\gguf_connector\models/bagel\ema_fp8_e4m3fn.safetensors does not contain metadata. Make sure to save your model with the save_pretrained
method. Defaulting to 'pt' metadata.
language_model.model.embed_tokens.weight: 0%| | 0/1223 [00:01<?, ?w/s, dev=0]"
too much to just run and test so frustrating
should see the dequantizing process for gguf file; run it again with the reproduced safetensors in the same directory then it should work; upgrade your gguf-connector to the latest version; as we dropped the model gguf dequantization before a better solution come out
as you could use the pure safetensors version still with ggc b1
I'm already using 1.7.8 gguf-connector see and not only that I tried to modify your code I thought It needed to locate that so I made this change see below
I'm still confused what do you mean by dequantizing process for gguf file, are you saying It converts from gguf to safetensor? and the vae i downloaded Is right? this one "pig_ae_fp32-f16.gguf"
and one more thing I got this error after the changes
oh, you commented out the convert_gguf_to_safetensors(input_path, vae_path, use_bf16)
; without this you can't get the reproduced vae for load_ae(vae_path)
; you don't need to modify the code unless you know how it does for rendering the data and processing your request; the original code works
checkpoint cannot be the gguf file; please point it back to the safetensors file; just simply use the original version then it should work
if you really want to change this; you might need to break out the accelerate
and flash-attn
libraries to make those changes work in tensor layer level
alright got It you want me to not download and use gguf files but safetensors, how about fp8? will It work cause I'm getting this
"(bagenv_py11) PS C:\Users\Yiki\Documents\Bag\BAGEL> ggc b2
pynvml (NVIDIA GPU monitoring library) initialized successfully!
Detecting GGUF/Safetensors...
No GGUF/Safetensors are available in the current directory.
--- Press ENTER To Exit ---"
I do have fp8 safetensor file in the same directory current one
you put both the gguf vae and the model safetensors files in the same directory and execute ggc b2
; if all are safetensors files then you better use ggc b1
need both vae and model files; since they work together
hey, for the model (2nd) selection, why you input 3 (pig_ae_fp32-f16-f32.safetensors) instead of 2 (ema_fp8_e4m3fn.safetensors)?
please check it carefully; troubleshot it yourself first always
use the original code; since your console shows that the dequantizing process was done twice; and we don't know what have you changed for the code and what exactly your modified version doing right away; better troubleshot it yourself; or switch it back to the original code see does it work or not
according to the error message, mostly likely those came from the vae_path you changed from the code; please check it carefully
allocate some to system ram and cpu; can still run it with 4gb vram but a bit slow only