update readme
Browse files
README.md
CHANGED
|
@@ -16,7 +16,7 @@ license: mit
|
|
| 16 |
|
| 17 |
# LiteLLama: Reduced-Scale, Experimental Versions of Llama
|
| 18 |
|
| 19 |
-
In this series of repos, we present an open-source reproduction of Meta AI's [
|
| 20 |
|
| 21 |
|
| 22 |
## Dataset and Tokenization
|
|
@@ -26,7 +26,7 @@ We train our models on part of [RedPajama](https://www.together.xyz/blog/redpaja
|
|
| 26 |
|
| 27 |
The model was trained with ~1T tokens (0.98T). num of tokens = steps*length*batch_size=499679*1024*192=98240888832≈0.98T.
|
| 28 |
|
| 29 |
-
The training curve is at https://wandb.ai/ahxt/llama2_xs_460M_training_loss/reports/reduced_train_loss-23-09-05-20-25-43---Vmlldzo1MzIwNDUx?accessToken=x2ch3n30jo77p1x8y7q9js4h4d8zpjtz1tzot4xxullyefixp4jwt7au2q37k2q6
|
| 30 |
|
| 31 |
### Using with HuggingFace Transformers
|
| 32 |
The experimental checkpoints can be directly loaded by [Transformers](https://huggingface.co/transformers/) library. The following code snippet shows how to load the our experimental model and generate text with it.
|
|
@@ -51,7 +51,7 @@ print( tokenizer.decode(tokens[0].tolist(), skip_special_tokens=True) )
|
|
| 51 |
|
| 52 |
## Evaluation
|
| 53 |
|
| 54 |
-
|
| 55 |
|
| 56 |
| Models | #parameters |zero-shot | 5-shot |
|
| 57 |
| --- | --- | --- | --- |
|
|
@@ -61,7 +61,7 @@ print( tokenizer.decode(tokens[0].tolist(), skip_special_tokens=True) )
|
|
| 61 |
| LiteLlama-460M-1T | 0.46B | 21.13 | 26.39 |
|
| 62 |
|
| 63 |
|
| 64 |
-
|
| 65 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ahxt__llama2_xs_460M_experimental)
|
| 66 |
|
| 67 |
| Metric | Value |
|
|
@@ -79,8 +79,7 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
|
|
| 79 |
|
| 80 |
|
| 81 |
## Contact
|
| 82 |
-
This
|
| 83 |
-
[Xiaotian Han](https://ahxt.github.io/) from Texas A&M University. The model is released
|
| 84 |
|
| 85 |
|
| 86 |
|
|
|
|
| 16 |
|
| 17 |
# LiteLLama: Reduced-Scale, Experimental Versions of Llama
|
| 18 |
|
| 19 |
+
In this series of repos, we present an open-source reproduction of Meta AI's [LLaMa 2](https://ai.meta.com/llama/). However, with significantly reduced model sizes, [LiteLlama-460M-1T](https://huggingface.co/ahxt/LiteLlama-460M-1T) has 460M parameters trained with 1T tokens.
|
| 20 |
|
| 21 |
|
| 22 |
## Dataset and Tokenization
|
|
|
|
| 26 |
|
| 27 |
The model was trained with ~1T tokens (0.98T). num of tokens = steps*length*batch_size=499679*1024*192=98240888832≈0.98T.
|
| 28 |
|
| 29 |
+
The training curve is at this [WandB project](https://wandb.ai/ahxt/llama2_xs_460M_training_loss/reports/reduced_train_loss-23-09-05-20-25-43---Vmlldzo1MzIwNDUx?accessToken=x2ch3n30jo77p1x8y7q9js4h4d8zpjtz1tzot4xxullyefixp4jwt7au2q37k2q6).
|
| 30 |
|
| 31 |
### Using with HuggingFace Transformers
|
| 32 |
The experimental checkpoints can be directly loaded by [Transformers](https://huggingface.co/transformers/) library. The following code snippet shows how to load the our experimental model and generate text with it.
|
|
|
|
| 51 |
|
| 52 |
## Evaluation
|
| 53 |
|
| 54 |
+
### We evaluate our models on the MMLU task.
|
| 55 |
|
| 56 |
| Models | #parameters |zero-shot | 5-shot |
|
| 57 |
| --- | --- | --- | --- |
|
|
|
|
| 61 |
| LiteLlama-460M-1T | 0.46B | 21.13 | 26.39 |
|
| 62 |
|
| 63 |
|
| 64 |
+
### [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
| 65 |
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_ahxt__llama2_xs_460M_experimental)
|
| 66 |
|
| 67 |
| Metric | Value |
|
|
|
|
| 79 |
|
| 80 |
|
| 81 |
## Contact
|
| 82 |
+
This model is developed by [Xiaotian Han](https://ahxt.github.io/) from Texas A&M University and released under MIT License.
|
|
|
|
| 83 |
|
| 84 |
|
| 85 |
|