topshik commited on
Commit
0f5ed11
·
verified ·
1 Parent(s): 86753ad

update tokens info

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -7,10 +7,11 @@ Mellum-base-4B is the first open-source installation of LLMs for code-related ta
7
  The model is trained specifically for code completion task on >3 trillion tokens with 8192 context window on N programming languages.
8
  We employed LLaMA-like architecture in total with 4B parameters without using Grouped Query Attention, which makes it convenient for both efficient in inference in cloud (e.g. with vLLM) and fast local inference (e.g. with llama.cpp or Ollama).
9
  Mellum was trained with AMP using bf16 precision, and the same bf16 version is uploaded to HuggingFace for public usage.
10
- It is designed for professional developer tooling integration (e.g., intelligent suggestions in IDEs), AI code assistants, and research applications in code understanding and generation. Published model is a base model meaning that it does not excel in down-stream tasks, however it is fully suitable for SFT/RL fine-tuning.
 
11
 
12
  # Training Data
13
- - Total Training Tokens: 3 trillion tokens
14
  - Corpus: StackV1, Starcoderdata, StackV2, CommitPack, English wiki
15
 
16
  # Training Details
 
7
  The model is trained specifically for code completion task on >3 trillion tokens with 8192 context window on N programming languages.
8
  We employed LLaMA-like architecture in total with 4B parameters without using Grouped Query Attention, which makes it convenient for both efficient in inference in cloud (e.g. with vLLM) and fast local inference (e.g. with llama.cpp or Ollama).
9
  Mellum was trained with AMP using bf16 precision, and the same bf16 version is uploaded to HuggingFace for public usage.
10
+ It is designed for professional developer tooling integration (e.g., intelligent suggestions in IDEs), AI code assistants, and research applications in code understanding and generation.
11
+ Published model is a base model meaning that it does not excel in down-stream tasks, however it is fully suitable for SFT/RL fine-tuning.
12
 
13
  # Training Data
14
+ - Total Training Tokens: ~4.2 trillion tokens
15
  - Corpus: StackV1, Starcoderdata, StackV2, CommitPack, English wiki
16
 
17
  # Training Details