update tokens info
Browse files
README.md
CHANGED
@@ -7,10 +7,11 @@ Mellum-base-4B is the first open-source installation of LLMs for code-related ta
|
|
7 |
The model is trained specifically for code completion task on >3 trillion tokens with 8192 context window on N programming languages.
|
8 |
We employed LLaMA-like architecture in total with 4B parameters without using Grouped Query Attention, which makes it convenient for both efficient in inference in cloud (e.g. with vLLM) and fast local inference (e.g. with llama.cpp or Ollama).
|
9 |
Mellum was trained with AMP using bf16 precision, and the same bf16 version is uploaded to HuggingFace for public usage.
|
10 |
-
It is designed for professional developer tooling integration (e.g., intelligent suggestions in IDEs), AI code assistants, and research applications in code understanding and generation.
|
|
|
11 |
|
12 |
# Training Data
|
13 |
-
- Total Training Tokens:
|
14 |
- Corpus: StackV1, Starcoderdata, StackV2, CommitPack, English wiki
|
15 |
|
16 |
# Training Details
|
|
|
7 |
The model is trained specifically for code completion task on >3 trillion tokens with 8192 context window on N programming languages.
|
8 |
We employed LLaMA-like architecture in total with 4B parameters without using Grouped Query Attention, which makes it convenient for both efficient in inference in cloud (e.g. with vLLM) and fast local inference (e.g. with llama.cpp or Ollama).
|
9 |
Mellum was trained with AMP using bf16 precision, and the same bf16 version is uploaded to HuggingFace for public usage.
|
10 |
+
It is designed for professional developer tooling integration (e.g., intelligent suggestions in IDEs), AI code assistants, and research applications in code understanding and generation.
|
11 |
+
Published model is a base model meaning that it does not excel in down-stream tasks, however it is fully suitable for SFT/RL fine-tuning.
|
12 |
|
13 |
# Training Data
|
14 |
+
- Total Training Tokens: ~4.2 trillion tokens
|
15 |
- Corpus: StackV1, Starcoderdata, StackV2, CommitPack, English wiki
|
16 |
|
17 |
# Training Details
|