Prince-1 commited on
Commit
f8eec27
·
verified ·
1 Parent(s): ca86013

Build the rkllm format of model Mellum-RKllm

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. Mellum-4B.rkllm +3 -0
  3. README.md +387 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Mellum-4B.rkllm filter=lfs diff=lfs merge=lfs -text
Mellum-4B.rkllm ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97d908c4c06e99b6b5e5037b03ab0d6a04e1e7d995f7847935582f86e46fb993
3
+ size 8065745118
README.md ADDED
@@ -0,0 +1,387 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - bigcode/the-stack
5
+ - bigcode/the-stack-v2
6
+ - bigcode/starcoderdata
7
+ - bigcode/commitpack
8
+ library_name: rkllm
9
+ tags:
10
+ - code
11
+ - rkllm
12
+ - rockchip
13
+ - rk3588
14
+ model-index:
15
+ - name: Mellum-4b-base
16
+ results:
17
+ - task:
18
+ type: text-generation
19
+ dataset:
20
+ type: tianyang/repobench_python_v1.1
21
+ name: RepoBench 1.1 (Python)
22
+ metrics:
23
+ - name: EM
24
+ type: exact_match
25
+ value: 0.2591
26
+ verified: false
27
+ - name: EM ≤ 8k
28
+ type: exact_match
29
+ value: 0.2797
30
+ verified: false
31
+ - task:
32
+ type: text-generation
33
+ dataset:
34
+ type: tianyang/repobench_python_v1.1
35
+ name: RepoBench 1.1 (Python, 2k)
36
+ metrics:
37
+ - name: EM
38
+ type: exact_match
39
+ value: 0.282
40
+ verified: false
41
+ - task:
42
+ type: text-generation
43
+ dataset:
44
+ type: tianyang/repobench_python_v1.1
45
+ name: RepoBench 1.1 (Python, 4k)
46
+ metrics:
47
+ - name: EM
48
+ type: exact_match
49
+ value: 0.2795
50
+ verified: false
51
+ - task:
52
+ type: text-generation
53
+ dataset:
54
+ type: tianyang/repobench_python_v1.1
55
+ name: RepoBench 1.1 (Python, 8k)
56
+ metrics:
57
+ - name: EM
58
+ type: exact_match
59
+ value: 0.2777
60
+ verified: false
61
+ - task:
62
+ type: text-generation
63
+ dataset:
64
+ type: tianyang/repobench_python_v1.1
65
+ name: RepoBench 1.1 (Python, 12k)
66
+ metrics:
67
+ - name: EM
68
+ type: exact_match
69
+ value: 0.2453
70
+ verified: false
71
+ - task:
72
+ type: text-generation
73
+ dataset:
74
+ type: tianyang/repobench_python_v1.1
75
+ name: RepoBench 1.1 (Python, 16k)
76
+ metrics:
77
+ - name: EM
78
+ type: exact_match
79
+ value: 0.211
80
+ verified: false
81
+ - task:
82
+ type: text-generation
83
+ dataset:
84
+ type: tianyang/repobench_java_v1.1
85
+ name: RepoBench 1.1 (Java)
86
+ metrics:
87
+ - name: EM
88
+ type: exact_match
89
+ value: 0.2858
90
+ verified: false
91
+ - name: EM ≤ 8k
92
+ type: exact_match
93
+ value: 0.3108
94
+ verified: false
95
+ - task:
96
+ type: text-generation
97
+ dataset:
98
+ type: tianyang/repobench_java_v1.1
99
+ name: RepoBench 1.1 (Java, 2k)
100
+ metrics:
101
+ - name: EM
102
+ type: exact_match
103
+ value: 0.3202
104
+ verified: false
105
+ - task:
106
+ type: text-generation
107
+ dataset:
108
+ type: tianyang/repobench_java_v1.1
109
+ name: RepoBench 1.1 (Java, 4k)
110
+ metrics:
111
+ - name: EM
112
+ type: exact_match
113
+ value: 0.3212
114
+ verified: false
115
+ - task:
116
+ type: text-generation
117
+ dataset:
118
+ type: tianyang/repobench_java_v1.1
119
+ name: RepoBench 1.1 (Java, 8k)
120
+ metrics:
121
+ - name: EM
122
+ type: exact_match
123
+ value: 0.291
124
+ verified: false
125
+ - task:
126
+ type: text-generation
127
+ dataset:
128
+ type: tianyang/repobench_java_v1.1
129
+ name: RepoBench 1.1 (Java, 12k)
130
+ metrics:
131
+ - name: EM
132
+ type: exact_match
133
+ value: 0.2492
134
+ verified: false
135
+ - task:
136
+ type: text-generation
137
+ dataset:
138
+ type: tianyang/repobench_java_v1.1
139
+ name: RepoBench 1.1 (Java, 16k)
140
+ metrics:
141
+ - name: EM
142
+ type: exact_match
143
+ value: 0.2474
144
+ verified: false
145
+ - task:
146
+ type: text-generation
147
+ dataset:
148
+ type: gonglinyuan/safim
149
+ name: SAFIM
150
+ metrics:
151
+ - name: pass@1
152
+ type: pass@1
153
+ value: 0.3811
154
+ verified: false
155
+ - task:
156
+ type: text-generation
157
+ dataset:
158
+ type: gonglinyuan/safim
159
+ name: SAFIM (Algorithmic)
160
+ metrics:
161
+ - name: pass@1
162
+ type: pass@1
163
+ value: 0.253
164
+ verified: false
165
+ - task:
166
+ type: text-generation
167
+ dataset:
168
+ type: gonglinyuan/safim
169
+ name: SAFIM (Control)
170
+ metrics:
171
+ - name: pass@1
172
+ type: pass@1
173
+ value: 0.3839
174
+ verified: false
175
+ - task:
176
+ type: text-generation
177
+ dataset:
178
+ type: gonglinyuan/safim
179
+ name: SAFIM (API)
180
+ metrics:
181
+ - name: pass@1
182
+ type: pass@1
183
+ value: 0.5065
184
+ verified: false
185
+ - task:
186
+ type: text-generation
187
+ dataset:
188
+ type: loubnabnl/humaneval_infilling
189
+ name: HumanEval Infilling (Single-Line)
190
+ metrics:
191
+ - name: pass@1
192
+ type: pass@1
193
+ value: 0.6621
194
+ verified: false
195
+ - task:
196
+ type: text-generation
197
+ dataset:
198
+ type: loubnabnl/humaneval_infilling
199
+ name: HumanEval Infilling (Multi-Line)
200
+ metrics:
201
+ - name: pass@1
202
+ type: pass@1
203
+ value: 0.3852
204
+ verified: false
205
+ - task:
206
+ type: text-generation
207
+ dataset:
208
+ type: loubnabnl/humaneval_infilling
209
+ name: HumanEval Infilling (Random Span)
210
+ metrics:
211
+ - name: pass@1
212
+ type: pass@1
213
+ value: 0.2969
214
+ verified: false
215
+ base_model:
216
+ - JetBrains/Mellum-4b-base
217
+ ---
218
+
219
+ # Model Description
220
+ Mellum-4b-base is JetBrains' first open-source large language model (LLM) optimized for code-related tasks.
221
+
222
+ Trained on over 4 trillion tokens with a context window of 8192 tokens across multiple programming languages, Mellum-4b-base is tailored specifically for code completion.
223
+ The model follows a LLaMA-style architecture with 4 billion parameters, making it efficient for both cloud inference (e.g., via vLLM) and local deployment (e.g., using llama.cpp or Ollama).
224
+
225
+ Mellum was trained using Automatic Mixed Precision (AMP) with bf16 precision.
226
+ The uploaded version on Hugging Face retains the bf16 format for public use.
227
+
228
+ Designed for integration into professional developer tooling (e.g., intelligent code suggestions in IDEs), AI-powered coding assistants, and research on code understanding and generation, Mellum is also well-suited for educational applications and fine-tuning experiments.
229
+
230
+ This release includes a base model, and Python SFT models as well.
231
+ Models for other languages will be released soon.
232
+ Keep in mind that base model is not fine-tuned for downstream tasks out-of-the-box, however, it is fully capable of supporting supervised fine-tuning (SFT) and reinforcement learning (RL) for adaptation to specific applications.
233
+
234
+ # Training Data
235
+ - Total Training Tokens: ~4.2 trillion tokens
236
+ - Corpus: The Stack, StarCoder Training Dataset, The Stack v2, CommitPack, English Wikipedia
237
+
238
+ # Training Details
239
+ - Context Window: 8,192 tokens
240
+ - Optimization: Standard language modeling objective.
241
+ - Hardware: Cluster of 256 x H200 NVIDIA GPUs with Infiniband
242
+ - Training Duration: ~20 days
243
+
244
+ # Benchmarks
245
+ In addition to the base model scores, we are providing scores for a Mellum fine-tuned for Python to provide model’s users with some estimation about potential capabilities.
246
+
247
+ ## RepoBench 1.1
248
+ - Type: single-line
249
+ - Languages: Python and Java
250
+ - Metric: Exact Match (EM), %
251
+
252
+ Since Mellum has a maximum context window of 8k, we report here both the average performance across all evaluated context lengths (2k, 4k, 8k, 12k, and 16k) and the average over context lengths within its supported range (≤ 8k).
253
+
254
+ ### Python Subset
255
+ | Model | 2k | 4k | 8k | 12k | 16k | Avg | Avg ≤ 8k |
256
+ |----------------------|--------|--------|--------|--------|--------|--------|----------|
257
+ | Mellum-4b-sft-python | 29.24% | 30.60% | 29.77% | 26.80% | 25.43% | 28.37% | 29.87% |
258
+ | Mellum-4b-base | 28.20% | 27.95% | 27.77% | 24.53% | 21.10% | 25.91% | 27.97% |
259
+
260
+ ### Java Subset
261
+ | Model | 2k | 4k | 8k | 12k | 16k | Avg | Avg ≤ 8k |
262
+ |----------------|--------|--------|--------|--------|--------|--------|----------|
263
+ | Mellum-4b-base | 32.02% | 32.12% | 29.10% | 24.92% | 24.74% | 28.58% | 31.08% |
264
+
265
+ ## Syntax-Aware Fill-in-the-Middle (SAFIM)
266
+ - Type: mix of multi-line and single-line
267
+ - Languages: multi-language
268
+ - Metric: pass@1, %
269
+
270
+ | Model | Algorithmic | Control | API | Average |
271
+ |----------------------|-------------|---------|--------|---------|
272
+ | Mellum-4b-sft-python | 33.16% | 36.11% | 57.10% | 42.12% |
273
+ | Mellum-4b-base | 25.30% | 38.39% | 50.65% | 38.11% |
274
+
275
+ ## HumanEval Infilling
276
+ - Type: single-line and multi-line
277
+ - Languages: Python
278
+ - Metric: pass@1, %
279
+
280
+ | Model | Single-Line | Multi-Line | Random Span |
281
+ |----------------------|-------------|------------|-------------|
282
+ | Mellum-4b-sft-python | 80.45% | 48.19% | 37.68% |
283
+ | Mellum-4b-base | 66.21% | 38.52% | 29.70% |
284
+
285
+ We continue to work on model improvements and will share the next iteration soon.
286
+
287
+ # Limitations
288
+ - Biases: May reflect biases present in public codebases. For example it will likely produce code which is similar in style to the open-source repositories.
289
+ - Security: Code suggestions should not be assumed to be secure or free of vulnerabilities.
290
+
291
+ # Sample Usage
292
+ Here are examples of how to run and sample from the model.
293
+
294
+ <!-- ## Generic generaion
295
+ ```python
296
+ from transformers import AutoTokenizer, AutoModelForCausalLM
297
+
298
+ example = """
299
+ import sys
300
+ import os
301
+ import time
302
+
303
+ sys.path.append(os.getcwd())
304
+
305
+ from cluster.prepare_data import get_headers_pairs_list, write_dist_matrix
306
+ from cluster.token_edit_distance import get_distance_matrix
307
+
308
+ if len(sys.argv) < 3:
309
+ print(
310
+ "Too few arguments. You should provide: \n1. dataset_filename" +
311
+ "\n2. output_data_filename"
312
+ )
313
+ sys.exit()
314
+
315
+ start = time.perf_counter()
316
+ dataset_filename_ = sys.argv[1]
317
+ output_data_filename_ = sys.argv[2]
318
+
319
+ headers_pairs = get_headers_pairs_list(dataset_filename_, verbose=True)
320
+
321
+ dist_matrix, max_dist = get_distance_matrix(
322
+ list(map(lambda x: x[1], headers_pairs)),
323
+ verbose=True
324
+ )
325
+
326
+ write_dist_matrix(dist_matrix, max_dist, output_data_filename_, verbose=True)
327
+
328
+ end = time.perf_counter()
329
+ """
330
+
331
+ tokenizer = AutoTokenizer.from_pretrained('JetBrains/Mellum-4b-base')
332
+ model = AutoModelForCausalLM.from_pretrained('JetBrains/Mellum-4b-base')
333
+ encoded_input = tokenizer(example, return_tensors='pt', return_token_type_ids=False)
334
+ input_len = len(encoded_input["input_ids"][0])
335
+ out = model.generate(
336
+ **encoded_input,
337
+ max_new_tokens=100,
338
+ )
339
+ print("### Context")
340
+ print(tokenizer.decode(out[0][:input_len]))
341
+ print("### Prediction")
342
+ print(tokenizer.decode(out[0][input_len:]))
343
+ ```
344
+
345
+ ## Fill in the middle with additional files as context generation
346
+ ```python
347
+ example = """<filename>utils.py
348
+ def multiply(x, y):
349
+ return x * y
350
+ <filename>config.py
351
+ DEBUG = True
352
+ MAX_VALUE = 100
353
+ <filename>example.py
354
+ <fim_suffix>
355
+
356
+ # Test the function
357
+ result = calculate_sum(5, 10)
358
+ print(result)<fim_prefix>def calculate_sum(a, b):
359
+ <fim_middle>"""
360
+
361
+ encoded_input = tokenizer(example, return_tensors='pt', return_token_type_ids=False)
362
+ out = model.generate(
363
+ **encoded_input,
364
+ max_new_tokens=100,
365
+ )
366
+ ``` -->
367
+
368
+ # Citation
369
+ If you use this model, please cite:
370
+
371
+ ```bibtex
372
+ @misc{Mellum-4b-base,
373
+ title = {Mellum-4b-base},
374
+ author = {Pavlichenko, Nikita and Nazarov, Iurii and Dolgov, Ivan and Garanina, Ekaterina and Lasocki, Karol and Reshetnikova, Julia and Boitsov, Sergei and Bondyrev, Ivan and Karaeva, Dariia and Sheptyakov, Maksim and Ustalov, Dmitry and Mukhin, Artem and Proshev, Semyon and Abramov, Nikita and Kolomyttseva, Olga and Lysaniuk, Kseniia and Zavidnyi, Ilia and Semenkin, Anton and Tankov, Vladislav and Sazanovich, Uladzislau},
375
+ year = {2025},
376
+ }
377
+ ```
378
+
379
+ # Contact
380
+ For questions, collaborations and requests reach us out via [email protected]
381
+
382
+ <!-- # Token_ID
383
+ INFO: Setting token_id of bos to 0
384
+ INFO: Setting token_id of eos to 0
385
+ INFO: Setting token_id of unk to 0
386
+ INFO: Setting token_id of pad to 0
387
+ INFO: Setting add_bos_token to False -->