JetBrains
/

Mellum-4b-base

Text Generation

text-generation-inference

Model card Files Files and versions Community

topshik commited on 6 days ago

Commit

fc1d5ff

·

verified ·

1 Parent(s): 4179e39

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -212,10 +212,10 @@ model-index:
 ---
 # Model Description
-Mellum-base-4B is JetBrains' first open-source large language model (LLM) optimized for code-related tasks.
-Trained on over 4 trillion tokens with a context window of 8192 tokens across multiple programming languages, Mellum-base-4B is tailored specifically for code completion.
-The model follows a LLaMA-style architecture with 4 billion parameters and does not use Grouped Query Attention (GQA), making it efficient for both cloud inference (e.g., via vLLM) and local deployment (e.g., using llama.cpp or Ollama).
 Mellum was trained using Automatic Mixed Precision (AMP) with bf16 precision.
 The uploaded version on Hugging Face retains the bf16 format for public use.

 ---
 # Model Description
+Mellum-4b-base is JetBrains' first open-source large language model (LLM) optimized for code-related tasks.
+Trained on over 4 trillion tokens with a context window of 8192 tokens across multiple programming languages, Mellum-4b-base is tailored specifically for code completion.
+The model follows a LLaMA-style architecture with 4 billion parameters, making it efficient for both cloud inference (e.g., via vLLM) and local deployment (e.g., using llama.cpp or Ollama).
 Mellum was trained using Automatic Mixed Precision (AMP) with bf16 precision.
 The uploaded version on Hugging Face retains the bf16 format for public use.