llama-cpp failed
llama_model_load: error loading model: create_tensor: tensor 'output.weight' not found
same
I'll look into it.
main: build = 2239 (3a03541c)
main: built with Apple clang version 15.0.0 (clang-1500.1.0.2.5) for arm64-apple-darwin23.2.0
main: seed = 1708607934
llama_model_loader: loaded meta data with 24 key-value pairs and 254 tensors from gemma-7b-it.Q8_0.gguf (version GGUF V3 (latest))
...
llm_load_vocab: mismatch in special tokens definition ( 416/256000 vs 260/256000 ).
...
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: model type = ?B
...
llama_model_load: error loading model: create_tensor: tensor 'output.weight' not found
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'gemma-7b-it.Q8_0.gguf'
"v2" files seems work fine
main: build = 2239 (3a03541c)
main: built with Apple clang version 15.0.0 (clang-1500.1.0.2.5) for arm64-apple-darwin23.2.0
main: seed = 1708610347
llama_model_loader: loaded meta data with 24 key-value pairs and 254 tensors from gemma-7b-it.Q8_0-v2.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = gemma
llama_model_loader: - kv 1: general.name str = gemma-7b-it
...
llm_load_vocab: mismatch in special tokens definition ( 416/256000 vs 260/256000 )
...
llama_print_timings: load time = 13195.25 ms
llama_print_timings: sample time = 56.71 ms / 118 runs ( 0.48 ms per token, 2080.65 tokens per second)
llama_print_timings: prompt eval time = 124.71 ms / 40 tokens ( 3.12 ms per token, 320.73 tokens per second)
llama_print_timings: eval time = 3367.45 ms / 117 runs ( 28.78 ms per token, 34.74 tokens per second)
llama_print_timings: total time = 3626.87 ms / 157 tokens
It's so relieving to see this. Due to my mistake (I used the wrong llama.cpp branch 🤦🏻♂️), many people couldn't use the model yesterday. I hope I can make it up.
is working now ..thanks