Update README.md
Browse files
README.md
CHANGED
|
@@ -24,12 +24,14 @@ base_model: meta-llama/LLaMA-2-7B
|
|
| 24 |
|
| 25 |
|
| 26 |
### Overview
|
| 27 |
-
|
| 28 |
This model is a distilled version of LLaMA 2, containing approximately 80 million parameters.
|
| 29 |
It was trained using a mix of OpenWebText and WikiText Raw V1 datasets.
|
| 30 |
Knowledge distillation was employed to transfer knowledge from a larger "teacher" model—Meta’s 7B LLaMA 2—to help this smaller model mimic the behavior of the teacher.
|
| 31 |
This version is the latest version of DistilLlama, which has gone through 5 days of training using two Nvidia A100 80G GPU.
|
| 32 |
|
|
|
|
|
|
|
|
|
|
| 33 |
### Model Architecture
|
| 34 |
|
| 35 |
The architecture is based on LLaMA 2, with the following parameters:
|
|
@@ -72,17 +74,6 @@ The architecture is based on LLaMA 2, with the following parameters:
|
|
| 72 |
|
| 73 |
*Note: CodeCarbon was used to track carbon emission. Allocated 80GB memory, 32 cores, Intel(R) Xeon(R) Gold 6448H for the evaluation*
|
| 74 |
|
| 75 |
-
### Example queries from our test set
|
| 76 |
-
|
| 77 |
-
| Query | Keyword | Response | Exact Match | Cosine Similarity | ROUGE Score |
|
| 78 |
-
|--------------------------------------------|------------------|-----------------------------------------------------------------------------------------------|-------------|-------------------|-------------|
|
| 79 |
-
| The capital of France is | Paris | The capital of France is the city of Paris, Paris is the capital and most populous city... | 1 | 0.757961 | 0.0625 |
|
| 80 |
-
| The currency of Japan is | Yen | The currency of Japan is the Japanese currency called Yen. | 1 | 0.774518 | 0.083333 |
|
| 81 |
-
| The largest ocean on Earth is | Pacific Ocean | The largest ocean on Earth is Pacific Ocean. | 0 | 0.721646 | 0.222222 |
|
| 82 |
-
| The continent known as the 'Dark Continent' is | Africa | The continent known as the 'Dark Continent' is Africa. | 1 | 0.725292 | 0.057143 |
|
| 83 |
-
| The theory of relativity was developed by | Albert Einstein | The theory of relativity was developed by Einstein, a famous physicist who... | 0 | 0.712056 | 0.055556 |
|
| 84 |
-
|
| 85 |
-
|
| 86 |
### GitHub Repositories
|
| 87 |
|
| 88 |
- **Training Repo**: [DistilLlama Training Repository](https://github.com/HenryHuang2/DistilLlama)
|
|
@@ -100,4 +91,4 @@ The architecture is based on LLaMA 2, with the following parameters:
|
|
| 100 |
url={https://arxiv.org/abs/2308.02019},
|
| 101 |
}
|
| 102 |
|
| 103 |
-
*Note: The repository will be updated as training progresses. Last update 2024-11-
|
|
|
|
| 24 |
|
| 25 |
|
| 26 |
### Overview
|
|
|
|
| 27 |
This model is a distilled version of LLaMA 2, containing approximately 80 million parameters.
|
| 28 |
It was trained using a mix of OpenWebText and WikiText Raw V1 datasets.
|
| 29 |
Knowledge distillation was employed to transfer knowledge from a larger "teacher" model—Meta’s 7B LLaMA 2—to help this smaller model mimic the behavior of the teacher.
|
| 30 |
This version is the latest version of DistilLlama, which has gone through 5 days of training using two Nvidia A100 80G GPU.
|
| 31 |
|
| 32 |
+
### Update
|
| 33 |
+
30 out of 300 checkpoints were examined, and the one with the best performance in semantic and factual accuracy has now been updated in this repository.
|
| 34 |
+
|
| 35 |
### Model Architecture
|
| 36 |
|
| 37 |
The architecture is based on LLaMA 2, with the following parameters:
|
|
|
|
| 74 |
|
| 75 |
*Note: CodeCarbon was used to track carbon emission. Allocated 80GB memory, 32 cores, Intel(R) Xeon(R) Gold 6448H for the evaluation*
|
| 76 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
### GitHub Repositories
|
| 78 |
|
| 79 |
- **Training Repo**: [DistilLlama Training Repository](https://github.com/HenryHuang2/DistilLlama)
|
|
|
|
| 91 |
url={https://arxiv.org/abs/2308.02019},
|
| 92 |
}
|
| 93 |
|
| 94 |
+
*Note: The repository will be updated as training progresses. Last update 2024-11-06*
|