JetBrains
/

Mellum-4b-base

Text Generation

text-generation-inference

Model card Files Files and versions Community

topshik commited on 7 days ago

Commit

0f5ed11

·

verified ·

1 Parent(s): 86753ad

update tokens info

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -7,10 +7,11 @@ Mellum-base-4B is the first open-source installation of LLMs for code-related ta
 The model is trained specifically for code completion task on >3 trillion tokens with 8192 context window on N programming languages.
 We employed LLaMA-like architecture in total with 4B parameters without using Grouped Query Attention, which makes it convenient for both efficient in inference in cloud (e.g. with vLLM) and fast local inference (e.g. with llama.cpp or Ollama).
 Mellum was trained with AMP using bf16 precision, and the same bf16 version is uploaded to HuggingFace for public usage.
-It is designed for professional developer tooling integration (e.g., intelligent suggestions in IDEs), AI code assistants, and research applications in code understanding and generation. Published model is a base model meaning that it does not excel in down-stream tasks, however it is fully suitable for SFT/RL fine-tuning.
 # Training Data
-- Total Training Tokens: 3 trillion tokens
 - Corpus: StackV1, Starcoderdata, StackV2, CommitPack, English wiki
 # Training Details

 The model is trained specifically for code completion task on >3 trillion tokens with 8192 context window on N programming languages.
 We employed LLaMA-like architecture in total with 4B parameters without using Grouped Query Attention, which makes it convenient for both efficient in inference in cloud (e.g. with vLLM) and fast local inference (e.g. with llama.cpp or Ollama).
 Mellum was trained with AMP using bf16 precision, and the same bf16 version is uploaded to HuggingFace for public usage.
+It is designed for professional developer tooling integration (e.g., intelligent suggestions in IDEs), AI code assistants, and research applications in code understanding and generation.
+Published model is a base model meaning that it does not excel in down-stream tasks, however it is fully suitable for SFT/RL fine-tuning.
 # Training Data
+- Total Training Tokens: ~4.2 trillion tokens
 - Corpus: StackV1, Starcoderdata, StackV2, CommitPack, English wiki
 # Training Details