Update README.md
Browse files
README.md
CHANGED
@@ -128,7 +128,7 @@ Example 3:
|
|
128 |
|
129 |
## Model description
|
130 |
|
131 |
-
The architecture is a modification of a standard decoder-only transformer.
|
132 |
|
133 |
The llama-2-70b models have been modified from a standard transformer in the following ways:
|
134 |
* It uses the [SwiGLU activation function](https://arxiv.org/abs/2002.05202)
|
|
|
128 |
|
129 |
## Model description
|
130 |
|
131 |
+
The architecture is a modification of a standard decoder-only transformer and was trained as a causal language model (clm).
|
132 |
|
133 |
The llama-2-70b models have been modified from a standard transformer in the following ways:
|
134 |
* It uses the [SwiGLU activation function](https://arxiv.org/abs/2002.05202)
|