topshik commited on
Commit
fc1d5ff
·
verified ·
1 Parent(s): 4179e39

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -212,10 +212,10 @@ model-index:
212
  ---
213
 
214
  # Model Description
215
- Mellum-base-4B is JetBrains' first open-source large language model (LLM) optimized for code-related tasks.
216
 
217
- Trained on over 4 trillion tokens with a context window of 8192 tokens across multiple programming languages, Mellum-base-4B is tailored specifically for code completion.
218
- The model follows a LLaMA-style architecture with 4 billion parameters and does not use Grouped Query Attention (GQA), making it efficient for both cloud inference (e.g., via vLLM) and local deployment (e.g., using llama.cpp or Ollama).
219
 
220
  Mellum was trained using Automatic Mixed Precision (AMP) with bf16 precision.
221
  The uploaded version on Hugging Face retains the bf16 format for public use.
 
212
  ---
213
 
214
  # Model Description
215
+ Mellum-4b-base is JetBrains' first open-source large language model (LLM) optimized for code-related tasks.
216
 
217
+ Trained on over 4 trillion tokens with a context window of 8192 tokens across multiple programming languages, Mellum-4b-base is tailored specifically for code completion.
218
+ The model follows a LLaMA-style architecture with 4 billion parameters, making it efficient for both cloud inference (e.g., via vLLM) and local deployment (e.g., using llama.cpp or Ollama).
219
 
220
  Mellum was trained using Automatic Mixed Precision (AMP) with bf16 precision.
221
  The uploaded version on Hugging Face retains the bf16 format for public use.