Update README.md
Browse files
README.md
CHANGED
|
@@ -265,7 +265,9 @@ Teapot LLM is fine-tuned from [flan-t5-base](https://huggingface.co/google/flan-
|
|
| 265 |
|
| 266 |
### Evaluation
|
| 267 |
TeapotLLM is focused on in-context reasoning tasks, and therefore most benchmarks are not suitable for evaluation. We want TeapotLLM to be a practical tool for QnA and information extraction, so we have developed custom datasets to benchmark performance.
|
|
|
|
| 268 |
[Evaluation Notebook Here](https://github.com/zakerytclarke/teapot/blob/main/docs/evals/TeapotLLM_Benchmark.ipynb)
|
|
|
|
| 269 |
#### Synthqa Evaluation
|
| 270 |
[Synthqa](https://huggingface.co/datasets/teapotai/synthqa) is a dataset focused on in-context QnA and information extraction tasks. We use the validation set to benchmark TeapotLLM against other models of similar size. All benchmarks were run using a Google Colab Notebook running on CPU with High Ram. Teapot significantly outperforms models of similar size, with low latency CPU inference and improved accuracy.
|
| 271 |
|
|
|
|
| 265 |
|
| 266 |
### Evaluation
|
| 267 |
TeapotLLM is focused on in-context reasoning tasks, and therefore most benchmarks are not suitable for evaluation. We want TeapotLLM to be a practical tool for QnA and information extraction, so we have developed custom datasets to benchmark performance.
|
| 268 |
+
|
| 269 |
[Evaluation Notebook Here](https://github.com/zakerytclarke/teapot/blob/main/docs/evals/TeapotLLM_Benchmark.ipynb)
|
| 270 |
+
|
| 271 |
#### Synthqa Evaluation
|
| 272 |
[Synthqa](https://huggingface.co/datasets/teapotai/synthqa) is a dataset focused on in-context QnA and information extraction tasks. We use the validation set to benchmark TeapotLLM against other models of similar size. All benchmarks were run using a Google Colab Notebook running on CPU with High Ram. Teapot significantly outperforms models of similar size, with low latency CPU inference and improved accuracy.
|
| 273 |
|