Update README.md
Browse files
README.md
CHANGED
@@ -96,11 +96,13 @@ Example 3:
|
|
96 |
>
|
97 |
> Note: You can also add chocolate chips, dried fruit, or other mix-ins to the batter for extra flavor and texture. Enjoy your vegan banana bread!
|
98 |
|
|
|
|
|
99 |
## Model Description
|
100 |
|
101 |
The architecture is a modification of a standard decoder-only transformer.
|
102 |
|
103 |
-
The llama-2 models have been modified from a standard transformer in the following ways:
|
104 |
* It uses [grouped-query attention](https://arxiv.org/pdf/2305.13245.pdf) (GQA), a generalization of multi-query attention which uses an intermediate number of key-value heads.
|
105 |
* It uses the [SwiGLU activation function](https://arxiv.org/abs/2002.05202)
|
106 |
* It uses [rotary positional embeddings](https://arxiv.org/abs/2104.09864) (RoPE)
|
|
|
96 |
>
|
97 |
> Note: You can also add chocolate chips, dried fruit, or other mix-ins to the batter for extra flavor and texture. Enjoy your vegan banana bread!
|
98 |
|
99 |
+
<br>
|
100 |
+
|
101 |
## Model Description
|
102 |
|
103 |
The architecture is a modification of a standard decoder-only transformer.
|
104 |
|
105 |
+
The llama-2-70b models have been modified from a standard transformer in the following ways:
|
106 |
* It uses [grouped-query attention](https://arxiv.org/pdf/2305.13245.pdf) (GQA), a generalization of multi-query attention which uses an intermediate number of key-value heads.
|
107 |
* It uses the [SwiGLU activation function](https://arxiv.org/abs/2002.05202)
|
108 |
* It uses [rotary positional embeddings](https://arxiv.org/abs/2104.09864) (RoPE)
|