Upload folder using huggingface_hub (#2)

- 3b4a9cbf9d0ca13acd56d56fe53e6830626107aa642f90df68b23e18313abd1a (ec0712c76df7fdd573156cd870c17af1fcc31308)
- 98754c588f13b661712ecb31a98727159540ef1579a709eb47ddfc829f845c00 (9dbca330203643acebfe6ddf46145456430227b0)
- b91ebbb91a8507ab1b5e985de2b85d636d6cecf7c858cd9176e5885739066e44 (a545ce2725711020cba8335570cd50d40848e01a)
- ee5baabb48642b386b1542d25477ea2342780b8d000077100a614f82eb0a6fb0 (2cc6ddfa8fed021e65aa56b8429c28d9870649a6)
- 6743965b00f7fa8c34a1e4188803b5dbc3db45491c49f7f767d35a364e896c86 (cce850f49266537ce814ef45aa90eca86adaafe1)

Files changed (7) hide show

.gitattributes +4 -0
34b-beta_Q4_K_M.gguf +3 -0
34b-beta_Q5_K_M.gguf +3 -0
34b-beta_Q6_K.gguf +3 -0
34b-beta_Q8_0.gguf +3 -0
README.md +38 -0
main.log +123 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+34b-beta_Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+34b-beta_Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+34b-beta_Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
+34b-beta_Q8_0.gguf filter=lfs diff=lfs merge=lfs -text

34b-beta_Q4_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dc11a5d711785af05871c338f355fc2c65db9727ac0ba992b43bb9e3bce2e32c
+size 20658710784

34b-beta_Q5_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:871406cc8cf7bcd24eb2ecf55ca7e3a10e9c2afafc07822591d4d5006b909990
+size 24321845504

34b-beta_Q6_K.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0412bbac6ef05b4647a0f9a87ce63b3e2e27e7aa843ae112aa2eb15b114da9a2
+size 28213926144

34b-beta_Q8_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:984547b9ddd10be6c95b2ecf85c6ca3fa2716216b18f5b2f62ba293d0018a98c
+size 36542281984

README.md ADDED Viewed

	@@ -0,0 +1,38 @@

+---
+license: gpl-3.0
+---
+# CausalLM 34B β
+## PROMPT FORMAT:
+[chatml](https://github.com/openai/openai-python/blob/main/chatml.md)
+There are some issues with the model weights in terms of precision. In the next version update, we will roll back some progress and retrain to fix these issues as soon as possible.
+**Please note:** Do not use "accelerated inference frameworks" like **VLLM** temporarily. Instead, use Transformers for inference. Otherwise, due to precision issues, the output quality will be significantly degraded. If you need faster inference, you can consider using the q8_0 quantization (faster and better than bf16 vllm for this model only) with llama.cpp temporarily or wait for the official version.
+To be fixed in the upcoming next version update.
+**no repetition_penalty!**
+Please do not use wikitext for quantization calibration because all wikitext have been re-aligned on synthetic dataset, and its distribution differs significantly from the original wikitext.
+## MT-Bench: 8.5
+![mt-bench](https://cdn-uploads.huggingface.co/production/uploads/63468a143ea42ee2cb49ddd1/2vv2_nGbfWuOM8jwy40dn.png)
+## Some contamination detection if you want to check:
+| Models                    | MMLU (ref: llama7b) | TBA  |
+| ------------------------- | ------------------- | ---- |
+| microsoft/Orca-2-7b       | 0.77                |      |
+| mistralai/Mistral-7B-v0.1 | 0.46                |      |
+| **CausalLM/34b-beta**     | **0.38**            |      |
+| 01-ai/Yi-6B-200K          | 0.3                 |      |
+data from https://huggingface.co/spaces/Yeyito/llm_contamination_detector
+It should be *safe*. It was not trained on the benchmark, but the contamination of the training dataset is unavoidable due to cost constraints.
+***
+Quantization of Model [CausalLM/34b-beta](https://huggingface.co/CausalLM/34b-beta).
+Created using [llm-quantizer](https://github.com/Nold360/llm-quantizer) Pipeline

main.log ADDED Viewed

	@@ -0,0 +1,123 @@

+[1708254748] Log start
+[1708254748] Cmd: /main -m 34b-beta_Q4_K_M.gguf -p "What is a Large Language Model?" -n 512 --temp 1
+[1708254748] main: build = 0 (unknown)
+[1708254748] main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
+[1708254748] main: seed  = 1708254748
+[1708254748] main: llama backend init
+[1708254748] main: load the model and apply lora adapter, if any
+[1708254748] llama_model_loader: loaded meta data with 24 key-value pairs and 543 tensors from 34b-beta_Q4_K_M.gguf (version GGUF V3 (latest))
+[1708254748] llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
+[1708254748] llama_model_loader: - kv   0:                       general.architecture str              = llama
+[1708254748] llama_model_loader: - kv   1:                               general.name str              = workspace
+[1708254748] llama_model_loader: - kv   2:                       llama.context_length u32              = 200000
+[1708254748] llama_model_loader: - kv   3:                     llama.embedding_length u32              = 7168
+[1708254748] llama_model_loader: - kv   4:                          llama.block_count u32              = 60
+[1708254748] llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 20480
+[1708254748] llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
+[1708254748] llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 56
+[1708254748] llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
+[1708254748] llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
+[1708254748] llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 5000000.000000
+[1708254748] llama_model_loader: - kv  11:                          general.file_type u32              = 15
+[1708254748] llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
+[1708254748] llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,64000]   = ["<unk>", "<s>", "</s>", "<|Human|>",...
+[1708254748] llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,64000]   = [-1000.000000, -1000.000000, -1000.00...
+[1708254748] llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,64000]   = [3, 3, 3, 1, 1, 1, 3, 3, 3, 1, 1, 1, ...
+[1708254748] llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 1
+[1708254748] llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 2
+[1708254748] llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 0
+[1708254748] llama_model_loader: - kv  19:            tokenizer.ggml.padding_token_id u32              = 0
+[1708254748] llama_model_loader: - kv  20:               tokenizer.ggml.add_bos_token bool             = false
+[1708254748] llama_model_loader: - kv  21:               tokenizer.ggml.add_eos_token bool             = false
+[1708254748] llama_model_loader: - kv  22:                    tokenizer.chat_template str              = {% for message in messages %}{{'<|im_...
+[1708254748] llama_model_loader: - kv  23:               general.quantization_version u32              = 2
+[1708254748] llama_model_loader: - type  f32:  121 tensors
+[1708254748] llama_model_loader: - type q4_K:  361 tensors
+[1708254748] llama_model_loader: - type q6_K:   61 tensors
+[1708254748] llm_load_vocab: mismatch in special tokens definition ( 498/64000 vs 262/64000 ).
+[1708254748] llm_load_print_meta: format           = GGUF V3 (latest)
+[1708254748] llm_load_print_meta: arch             = llama
+[1708254748] llm_load_print_meta: vocab type       = SPM
+[1708254748] llm_load_print_meta: n_vocab          = 64000
+[1708254748] llm_load_print_meta: n_merges         = 0
+[1708254748] llm_load_print_meta: n_ctx_train      = 200000
+[1708254748] llm_load_print_meta: n_embd           = 7168
+[1708254748] llm_load_print_meta: n_head           = 56
+[1708254748] llm_load_print_meta: n_head_kv        = 8
+[1708254748] llm_load_print_meta: n_layer          = 60
+[1708254748] llm_load_print_meta: n_rot            = 128
+[1708254748] llm_load_print_meta: n_embd_head_k    = 128
+[1708254748] llm_load_print_meta: n_embd_head_v    = 128
+[1708254748] llm_load_print_meta: n_gqa            = 7
+[1708254748] llm_load_print_meta: n_embd_k_gqa     = 1024
+[1708254748] llm_load_print_meta: n_embd_v_gqa     = 1024
+[1708254748] llm_load_print_meta: f_norm_eps       = 0.0e+00
+[1708254748] llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
+[1708254748] llm_load_print_meta: f_clamp_kqv      = 0.0e+00
+[1708254748] llm_load_print_meta: f_max_alibi_bias = 0.0e+00
+[1708254748] llm_load_print_meta: n_ff             = 20480
+[1708254748] llm_load_print_meta: n_expert         = 0
+[1708254748] llm_load_print_meta: n_expert_used    = 0
+[1708254748] llm_load_print_meta: rope scaling     = linear
+[1708254748] llm_load_print_meta: freq_base_train  = 5000000.0
+[1708254748] llm_load_print_meta: freq_scale_train = 1
+[1708254748] llm_load_print_meta: n_yarn_orig_ctx  = 200000
+[1708254748] llm_load_print_meta: rope_finetuned   = unknown
+[1708254748] llm_load_print_meta: model type       = 30B
+[1708254748] llm_load_print_meta: model ftype      = Q4_K - Medium
+[1708254748] llm_load_print_meta: model params     = 34.39 B
+[1708254748] llm_load_print_meta: model size       = 19.24 GiB (4.81 BPW)
+[1708254748] llm_load_print_meta: general.name     = workspace
+[1708254748] llm_load_print_meta: BOS token        = 1 '<s>'
+[1708254748] llm_load_print_meta: EOS token        = 2 '</s>'
+[1708254748] llm_load_print_meta: UNK token        = 0 '<unk>'
+[1708254748] llm_load_print_meta: PAD token        = 0 '<unk>'
+[1708254748] llm_load_print_meta: LF token         = 315 '<0x0A>'
+[1708254748] llm_load_tensors: ggml ctx size =    0.21 MiB
+[1708254793] llm_load_tensors:        CPU buffer size = 19700.24 MiB
+[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793]
+[1708254793] llama_new_context_with_model: n_ctx      = 512
+[1708254793] llama_new_context_with_model: freq_base  = 5000000.0
+[1708254793] llama_new_context_with_model: freq_scale = 1
+[1708254793] llama_kv_cache_init:        CPU KV buffer size =   120.00 MiB
+[1708254793] llama_new_context_with_model: KV self size  =  120.00 MiB, K (f16):   60.00 MiB, V (f16):   60.00 MiB
+[1708254793] llama_new_context_with_model:        CPU input buffer size   =    16.01 MiB
+[1708254793] llama_new_context_with_model:        CPU compute buffer size =   139.00 MiB
+[1708254793] llama_new_context_with_model: graph splits (measure): 1
+[1708254793] warming up the model with an empty run
+[1708254841] n_ctx: 512
+[1708254841]
+[1708254841] system_info: n_threads = 16 / 32 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
+[1708254841] add_bos: 0
+[1708254841] tokenize the prompt
+[1708254841] prompt: "What is a Large Language Model?"
+[1708254841] tokens: [ ' What':2371, ' is':620, ' a':562, ' Large':21356, ' Lang':29527, 'ua':8949, 'ge':671, ' Model':9627, '?':100 ]
+[1708254841] recalculate the cached logits (check): embd_inp.empty() false, n_matching_session_tokens 0, embd_inp.size() 9, session_tokens.size() 0, embd_inp.size() 9
+[1708254841] inp_pfx: [ ' ':59568, '':144, '':144, '###':8308, ' Inst':3335, 'ruction':3252, ':':59601, '':144, '':144 ]
+[1708254841] inp_sfx: [ ' ':59568, '':144, '':144, '###':8308, ' Response':21278, ':':59601, '':144, '':144 ]
+[1708254841] cml_pfx: [ ' ':59568, '':144, '':6, 'user':3903, '':144 ]
+[1708254841] cml_sfx: [ '':7, '':144, '':6, 'assis':33509, 'tan':11064, 't':59570, '':144 ]
+[1708254841] sampling:
+	repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
+	top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 1.000
+	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
+[1708254841] sampling order:
+CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
+[1708254841] generate: n_ctx = 512, n_batch = 512, n_predict = 512, n_keep = 0
+[1708254841]
+[1708254841] embd_inp.size(): 9, n_consumed: 0
+[1708254841] eval: [ ' What':2371, ' is':620, ' a':562, ' Large':21356, ' Lang':29527, 'ua':8949, 'ge':671, ' Model':9627, '?':100 ]
+[1708254888] n_past = 9
+[1708254888] sampled token:     2: ''
+[1708254888] last: [ '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, ' What':2371, ' is':620, ' a':562, ' Large':21356, ' Lang':29527, 'ua':8949, 'ge':671, ' Model':9627, '?':100, '':2 ]
+[1708254888] n_remain: 511
+[1708254888] found EOS token
+[1708254888]  [end of text]
+[1708254888]
+[1708254888] llama_print_timings:        load time =   92761.63 ms
+[1708254888] llama_print_timings:      sample time =       0.68 ms /     1 runs   (    0.68 ms per token,  1461.99 tokens per second)
+[1708254888] llama_print_timings: prompt eval time =   47314.99 ms /     9 tokens ( 5257.22 ms per token,     0.19 tokens per second)
+[1708254888] llama_print_timings:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
+[1708254888] llama_print_timings:       total time =   47326.73 ms /    10 tokens
+[1708254889] Log end