nold commited on
Commit
3046d85
·
verified ·
1 Parent(s): d425335

Upload folder using huggingface_hub (#2)

Browse files

- 3b4a9cbf9d0ca13acd56d56fe53e6830626107aa642f90df68b23e18313abd1a (ec0712c76df7fdd573156cd870c17af1fcc31308)
- 98754c588f13b661712ecb31a98727159540ef1579a709eb47ddfc829f845c00 (9dbca330203643acebfe6ddf46145456430227b0)
- b91ebbb91a8507ab1b5e985de2b85d636d6cecf7c858cd9176e5885739066e44 (a545ce2725711020cba8335570cd50d40848e01a)
- ee5baabb48642b386b1542d25477ea2342780b8d000077100a614f82eb0a6fb0 (2cc6ddfa8fed021e65aa56b8429c28d9870649a6)
- 6743965b00f7fa8c34a1e4188803b5dbc3db45491c49f7f767d35a364e896c86 (cce850f49266537ce814ef45aa90eca86adaafe1)

.gitattributes CHANGED
@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ 34b-beta_Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
37
+ 34b-beta_Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
38
+ 34b-beta_Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
39
+ 34b-beta_Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
34b-beta_Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dc11a5d711785af05871c338f355fc2c65db9727ac0ba992b43bb9e3bce2e32c
3
+ size 20658710784
34b-beta_Q5_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:871406cc8cf7bcd24eb2ecf55ca7e3a10e9c2afafc07822591d4d5006b909990
3
+ size 24321845504
34b-beta_Q6_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0412bbac6ef05b4647a0f9a87ce63b3e2e27e7aa843ae112aa2eb15b114da9a2
3
+ size 28213926144
34b-beta_Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:984547b9ddd10be6c95b2ecf85c6ca3fa2716216b18f5b2f62ba293d0018a98c
3
+ size 36542281984
README.md ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: gpl-3.0
3
+ ---
4
+ # CausalLM 34B β
5
+
6
+ ## PROMPT FORMAT:
7
+ [chatml](https://github.com/openai/openai-python/blob/main/chatml.md)
8
+
9
+ There are some issues with the model weights in terms of precision. In the next version update, we will roll back some progress and retrain to fix these issues as soon as possible.
10
+
11
+ **Please note:** Do not use "accelerated inference frameworks" like **VLLM** temporarily. Instead, use Transformers for inference. Otherwise, due to precision issues, the output quality will be significantly degraded. If you need faster inference, you can consider using the q8_0 quantization (faster and better than bf16 vllm for this model only) with llama.cpp temporarily or wait for the official version.
12
+ To be fixed in the upcoming next version update.
13
+
14
+ **no repetition_penalty!**
15
+
16
+ Please do not use wikitext for quantization calibration because all wikitext have been re-aligned on synthetic dataset, and its distribution differs significantly from the original wikitext.
17
+
18
+ ## MT-Bench: 8.5
19
+
20
+ ![mt-bench](https://cdn-uploads.huggingface.co/production/uploads/63468a143ea42ee2cb49ddd1/2vv2_nGbfWuOM8jwy40dn.png)
21
+
22
+ ## Some contamination detection if you want to check:
23
+
24
+ | Models | MMLU (ref: llama7b) | TBA |
25
+ | ------------------------- | ------------------- | ---- |
26
+ | microsoft/Orca-2-7b | 0.77 | |
27
+ | mistralai/Mistral-7B-v0.1 | 0.46 | |
28
+ | **CausalLM/34b-beta** | **0.38** | |
29
+ | 01-ai/Yi-6B-200K | 0.3 | |
30
+
31
+ data from https://huggingface.co/spaces/Yeyito/llm_contamination_detector
32
+
33
+ It should be *safe*. It was not trained on the benchmark, but the contamination of the training dataset is unavoidable due to cost constraints.
34
+
35
+ ***
36
+
37
+ Quantization of Model [CausalLM/34b-beta](https://huggingface.co/CausalLM/34b-beta).
38
+ Created using [llm-quantizer](https://github.com/Nold360/llm-quantizer) Pipeline
main.log ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [1708254748] Log start
2
+ [1708254748] Cmd: /main -m 34b-beta_Q4_K_M.gguf -p "What is a Large Language Model?" -n 512 --temp 1
3
+ [1708254748] main: build = 0 (unknown)
4
+ [1708254748] main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
5
+ [1708254748] main: seed = 1708254748
6
+ [1708254748] main: llama backend init
7
+ [1708254748] main: load the model and apply lora adapter, if any
8
+ [1708254748] llama_model_loader: loaded meta data with 24 key-value pairs and 543 tensors from 34b-beta_Q4_K_M.gguf (version GGUF V3 (latest))
9
+ [1708254748] llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
10
+ [1708254748] llama_model_loader: - kv 0: general.architecture str = llama
11
+ [1708254748] llama_model_loader: - kv 1: general.name str = workspace
12
+ [1708254748] llama_model_loader: - kv 2: llama.context_length u32 = 200000
13
+ [1708254748] llama_model_loader: - kv 3: llama.embedding_length u32 = 7168
14
+ [1708254748] llama_model_loader: - kv 4: llama.block_count u32 = 60
15
+ [1708254748] llama_model_loader: - kv 5: llama.feed_forward_length u32 = 20480
16
+ [1708254748] llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
17
+ [1708254748] llama_model_loader: - kv 7: llama.attention.head_count u32 = 56
18
+ [1708254748] llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8
19
+ [1708254748] llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
20
+ [1708254748] llama_model_loader: - kv 10: llama.rope.freq_base f32 = 5000000.000000
21
+ [1708254748] llama_model_loader: - kv 11: general.file_type u32 = 15
22
+ [1708254748] llama_model_loader: - kv 12: tokenizer.ggml.model str = llama
23
+ [1708254748] llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,64000] = ["<unk>", "<s>", "</s>", "<|Human|>",...
24
+ [1708254748] llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,64000] = [-1000.000000, -1000.000000, -1000.00...
25
+ [1708254748] llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,64000] = [3, 3, 3, 1, 1, 1, 3, 3, 3, 1, 1, 1, ...
26
+ [1708254748] llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1
27
+ [1708254748] llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2
28
+ [1708254748] llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0
29
+ [1708254748] llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 0
30
+ [1708254748] llama_model_loader: - kv 20: tokenizer.ggml.add_bos_token bool = false
31
+ [1708254748] llama_model_loader: - kv 21: tokenizer.ggml.add_eos_token bool = false
32
+ [1708254748] llama_model_loader: - kv 22: tokenizer.chat_template str = {% for message in messages %}{{'<|im_...
33
+ [1708254748] llama_model_loader: - kv 23: general.quantization_version u32 = 2
34
+ [1708254748] llama_model_loader: - type f32: 121 tensors
35
+ [1708254748] llama_model_loader: - type q4_K: 361 tensors
36
+ [1708254748] llama_model_loader: - type q6_K: 61 tensors
37
+ [1708254748] llm_load_vocab: mismatch in special tokens definition ( 498/64000 vs 262/64000 ).
38
+ [1708254748] llm_load_print_meta: format = GGUF V3 (latest)
39
+ [1708254748] llm_load_print_meta: arch = llama
40
+ [1708254748] llm_load_print_meta: vocab type = SPM
41
+ [1708254748] llm_load_print_meta: n_vocab = 64000
42
+ [1708254748] llm_load_print_meta: n_merges = 0
43
+ [1708254748] llm_load_print_meta: n_ctx_train = 200000
44
+ [1708254748] llm_load_print_meta: n_embd = 7168
45
+ [1708254748] llm_load_print_meta: n_head = 56
46
+ [1708254748] llm_load_print_meta: n_head_kv = 8
47
+ [1708254748] llm_load_print_meta: n_layer = 60
48
+ [1708254748] llm_load_print_meta: n_rot = 128
49
+ [1708254748] llm_load_print_meta: n_embd_head_k = 128
50
+ [1708254748] llm_load_print_meta: n_embd_head_v = 128
51
+ [1708254748] llm_load_print_meta: n_gqa = 7
52
+ [1708254748] llm_load_print_meta: n_embd_k_gqa = 1024
53
+ [1708254748] llm_load_print_meta: n_embd_v_gqa = 1024
54
+ [1708254748] llm_load_print_meta: f_norm_eps = 0.0e+00
55
+ [1708254748] llm_load_print_meta: f_norm_rms_eps = 1.0e-05
56
+ [1708254748] llm_load_print_meta: f_clamp_kqv = 0.0e+00
57
+ [1708254748] llm_load_print_meta: f_max_alibi_bias = 0.0e+00
58
+ [1708254748] llm_load_print_meta: n_ff = 20480
59
+ [1708254748] llm_load_print_meta: n_expert = 0
60
+ [1708254748] llm_load_print_meta: n_expert_used = 0
61
+ [1708254748] llm_load_print_meta: rope scaling = linear
62
+ [1708254748] llm_load_print_meta: freq_base_train = 5000000.0
63
+ [1708254748] llm_load_print_meta: freq_scale_train = 1
64
+ [1708254748] llm_load_print_meta: n_yarn_orig_ctx = 200000
65
+ [1708254748] llm_load_print_meta: rope_finetuned = unknown
66
+ [1708254748] llm_load_print_meta: model type = 30B
67
+ [1708254748] llm_load_print_meta: model ftype = Q4_K - Medium
68
+ [1708254748] llm_load_print_meta: model params = 34.39 B
69
+ [1708254748] llm_load_print_meta: model size = 19.24 GiB (4.81 BPW)
70
+ [1708254748] llm_load_print_meta: general.name = workspace
71
+ [1708254748] llm_load_print_meta: BOS token = 1 '<s>'
72
+ [1708254748] llm_load_print_meta: EOS token = 2 '</s>'
73
+ [1708254748] llm_load_print_meta: UNK token = 0 '<unk>'
74
+ [1708254748] llm_load_print_meta: PAD token = 0 '<unk>'
75
+ [1708254748] llm_load_print_meta: LF token = 315 '<0x0A>'
76
+ [1708254748] llm_load_tensors: ggml ctx size = 0.21 MiB
77
+ [1708254793] llm_load_tensors: CPU buffer size = 19700.24 MiB
78
+ [1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793] .[1708254793]
79
+ [1708254793] llama_new_context_with_model: n_ctx = 512
80
+ [1708254793] llama_new_context_with_model: freq_base = 5000000.0
81
+ [1708254793] llama_new_context_with_model: freq_scale = 1
82
+ [1708254793] llama_kv_cache_init: CPU KV buffer size = 120.00 MiB
83
+ [1708254793] llama_new_context_with_model: KV self size = 120.00 MiB, K (f16): 60.00 MiB, V (f16): 60.00 MiB
84
+ [1708254793] llama_new_context_with_model: CPU input buffer size = 16.01 MiB
85
+ [1708254793] llama_new_context_with_model: CPU compute buffer size = 139.00 MiB
86
+ [1708254793] llama_new_context_with_model: graph splits (measure): 1
87
+ [1708254793] warming up the model with an empty run
88
+ [1708254841] n_ctx: 512
89
+ [1708254841]
90
+ [1708254841] system_info: n_threads = 16 / 32 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 |
91
+ [1708254841] add_bos: 0
92
+ [1708254841] tokenize the prompt
93
+ [1708254841] prompt: "What is a Large Language Model?"
94
+ [1708254841] tokens: [ ' What':2371, ' is':620, ' a':562, ' Large':21356, ' Lang':29527, 'ua':8949, 'ge':671, ' Model':9627, '?':100 ]
95
+ [1708254841] recalculate the cached logits (check): embd_inp.empty() false, n_matching_session_tokens 0, embd_inp.size() 9, session_tokens.size() 0, embd_inp.size() 9
96
+ [1708254841] inp_pfx: [ ' ':59568, '':144, '':144, '###':8308, ' Inst':3335, 'ruction':3252, ':':59601, '':144, '':144 ]
97
+ [1708254841] inp_sfx: [ ' ':59568, '':144, '':144, '###':8308, ' Response':21278, ':':59601, '':144, '':144 ]
98
+ [1708254841] cml_pfx: [ ' ':59568, '':144, '':6, 'user':3903, '':144 ]
99
+ [1708254841] cml_sfx: [ '':7, '':144, '':6, 'assis':33509, 'tan':11064, 't':59570, '':144 ]
100
+ [1708254841] sampling:
101
+ repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
102
+ top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 1.000
103
+ mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
104
+ [1708254841] sampling order:
105
+ CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature
106
+ [1708254841] generate: n_ctx = 512, n_batch = 512, n_predict = 512, n_keep = 0
107
+ [1708254841]
108
+
109
+ [1708254841] embd_inp.size(): 9, n_consumed: 0
110
+ [1708254841] eval: [ ' What':2371, ' is':620, ' a':562, ' Large':21356, ' Lang':29527, 'ua':8949, 'ge':671, ' Model':9627, '?':100 ]
111
+ [1708254888] n_past = 9
112
+ [1708254888] sampled token: 2: ''
113
+ [1708254888] last: [ '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, '':0, ' What':2371, ' is':620, ' a':562, ' Large':21356, ' Lang':29527, 'ua':8949, 'ge':671, ' Model':9627, '?':100, '':2 ]
114
+ [1708254888] n_remain: 511
115
+ [1708254888] found EOS token
116
+ [1708254888] [end of text]
117
+ [1708254888]
118
+ [1708254888] llama_print_timings: load time = 92761.63 ms
119
+ [1708254888] llama_print_timings: sample time = 0.68 ms / 1 runs ( 0.68 ms per token, 1461.99 tokens per second)
120
+ [1708254888] llama_print_timings: prompt eval time = 47314.99 ms / 9 tokens ( 5257.22 ms per token, 0.19 tokens per second)
121
+ [1708254888] llama_print_timings: eval time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
122
+ [1708254888] llama_print_timings: total time = 47326.73 ms / 10 tokens
123
+ [1708254889] Log end