Update README.md
Browse files
README.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
-
- it
|
| 4 |
- en
|
|
|
|
| 5 |
library_name: transformers
|
| 6 |
license: apache-2.0
|
| 7 |
tags:
|
|
@@ -20,11 +20,14 @@ model-index:
|
|
| 20 |
|
| 21 |
# Veronica — Custom Causal LM (decoder-only)
|
| 22 |
|
| 23 |
-
**Veronica**
|
| 24 |
-
|
| 25 |
-
Attenzione: **DuoAttention** (stream vs full window) + **SEAL** scaling sulle retrieval-heads. **RMSNorm** + **SwiGLU**.
|
| 26 |
|
| 27 |
-
> **
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
## Quickstart
|
| 30 |
|
|
@@ -40,6 +43,6 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
| 40 |
device_map="auto",
|
| 41 |
)
|
| 42 |
|
| 43 |
-
prompt = "
|
| 44 |
out = model.generate(**tok(prompt, return_tensors="pt").to(model.device))
|
| 45 |
-
print(tok.decode(out[0], skip_special_tokens=True))
|
|
|
|
| 1 |
---
|
| 2 |
language:
|
|
|
|
| 3 |
- en
|
| 4 |
+
- it
|
| 5 |
library_name: transformers
|
| 6 |
license: apache-2.0
|
| 7 |
tags:
|
|
|
|
| 20 |
|
| 21 |
# Veronica — Custom Causal LM (decoder-only)
|
| 22 |
|
| 23 |
+
**Veronica** is a custom *decoder-only* large language model, designed to maximize **depth efficiency** and token-level reasoning quality under limited resources.
|
| 24 |
+
It features **32 layers × 1024 hidden × 16 heads (GQA=4)**, extended context via **RoPE (θ=1e6) + YaRN scaling** up to **32k tokens**, and advanced attention routing with **DuoAttention** and **SEAL scaling**.
|
|
|
|
| 25 |
|
| 26 |
+
> **Status:** prototype under pretraining.
|
| 27 |
+
> This repository currently provides **code, config, and tokenizer** to load Veronica with `trust_remote_code=True`.
|
| 28 |
+
> Model weights will be released in a future update.
|
| 29 |
+
|
| 30 |
+
---
|
| 31 |
|
| 32 |
## Quickstart
|
| 33 |
|
|
|
|
| 43 |
device_map="auto",
|
| 44 |
)
|
| 45 |
|
| 46 |
+
prompt = "Explain in simple terms what Veronica is:"
|
| 47 |
out = model.generate(**tok(prompt, return_tensors="pt").to(model.device))
|
| 48 |
+
print(tok.decode(out[0], skip_special_tokens=True))
|