Spaces:
Restarting
on
CPU Upgrade
Restarting
on
CPU Upgrade
Miaoran000
commited on
Commit
•
066863b
1
Parent(s):
2f52d69
Update src/display/about.py
Browse files- src/display/about.py +7 -55
src/display/about.py
CHANGED
@@ -24,7 +24,7 @@ TITLE = """<h1 align="center" id="space-title">Hughes Hallucination Evaluation M
|
|
24 |
# What does your leaderboard evaluate?
|
25 |
INTRODUCTION_TEXT = """
|
26 |
This leaderboard (by [Vectara](https://vectara.com)) evaluates how often an LLM introduces hallucinations when summarizing a document. <br>
|
27 |
-
The leaderboard utilizes [
|
28 |
|
29 |
"""
|
30 |
|
@@ -38,9 +38,9 @@ Hallucinations refer to instances where a model introduces factually incorrect o
|
|
38 |
|
39 |
## How it works
|
40 |
|
41 |
-
Using [Vectara](https://vectara.com)'s HHEM, we measure the occurrence of hallucinations in generated summaries.
|
42 |
Given a source document and a summary generated by an LLM, HHEM outputs a hallucination score between 0 and 1, with 0 indicating complete hallucination and 1 representing perfect factual consistency.
|
43 |
-
The model card for HHEM can be found [here](https://huggingface.co/vectara/hallucination_evaluation_model).
|
44 |
|
45 |
## Evaluation Dataset
|
46 |
|
@@ -60,59 +60,11 @@ If you would like to submit your model that is not available on the Hugging Face
|
|
60 |
## Model Submissions and Reproducibility
|
61 |
You can submit your model for evaluation, whether it's hosted on the Hugging Face model hub or not. (Though it is recommended to host your model on the Hugging Face)
|
62 |
|
63 |
-
###
|
64 |
-
1)
|
65 |
-
2)
|
66 |
-
|
67 |
|
68 |
-
### For models available on the Hugging Face model hub:
|
69 |
-
To replicate the evaluation result for a Hugging Face model:
|
70 |
-
|
71 |
-
1) Clone the Repository
|
72 |
-
```python
|
73 |
-
git lfs install
|
74 |
-
git clone https://huggingface.co/spaces/vectara/leaderboard
|
75 |
-
```
|
76 |
-
2) Install the Requirements
|
77 |
-
```python
|
78 |
-
pip install -r requirements.txt
|
79 |
-
```
|
80 |
-
3) Set Up Your Hugging Face Token
|
81 |
-
```python
|
82 |
-
export HF_TOKEN=your_token
|
83 |
-
```
|
84 |
-
4) Run the Evaluation Script
|
85 |
-
```python
|
86 |
-
python main_backend.py --model your_model_id --precision float16
|
87 |
-
```
|
88 |
-
5) Check Results
|
89 |
-
After the evaluation, results are saved in "eval-results-bk/your_model_id/results.json".
|
90 |
-
|
91 |
-
## Results Format
|
92 |
-
The results are structured in JSON as follows:
|
93 |
-
```python
|
94 |
-
{
|
95 |
-
"config": {
|
96 |
-
"model_dtype": "float16",
|
97 |
-
"model_name": "your_model_id",
|
98 |
-
"model_sha": "main"
|
99 |
-
},
|
100 |
-
"results": {
|
101 |
-
"hallucination_rate": {
|
102 |
-
"hallucination_rate": ...
|
103 |
-
},
|
104 |
-
"factual_consistency_rate": {
|
105 |
-
"factual_consistency_rate": ...
|
106 |
-
},
|
107 |
-
"answer_rate": {
|
108 |
-
"answer_rate": ...
|
109 |
-
},
|
110 |
-
"average_summary_length": {
|
111 |
-
"average_summary_length": ...
|
112 |
-
}
|
113 |
-
}
|
114 |
-
}
|
115 |
-
```
|
116 |
For additional queries or model submissions, please contact [email protected].
|
117 |
"""
|
118 |
|
|
|
24 |
# What does your leaderboard evaluate?
|
25 |
INTRODUCTION_TEXT = """
|
26 |
This leaderboard (by [Vectara](https://vectara.com)) evaluates how often an LLM introduces hallucinations when summarizing a document. <br>
|
27 |
+
The leaderboard utilizes HHEM-2.1 hallucination detection model. The open source version of HHEM-2.1 can be found [here](https://huggingface.co/vectara/hallucination_evaluation_model).<br>
|
28 |
|
29 |
"""
|
30 |
|
|
|
38 |
|
39 |
## How it works
|
40 |
|
41 |
+
Using [Vectara](https://vectara.com)'s HHEM-2.1 hallucination evaluation model, we measure the occurrence of hallucinations in generated summaries.
|
42 |
Given a source document and a summary generated by an LLM, HHEM outputs a hallucination score between 0 and 1, with 0 indicating complete hallucination and 1 representing perfect factual consistency.
|
43 |
+
The model card for HHEM-2.1-Open, which is the open source version of HHEM-2.1, can be found [here](https://huggingface.co/vectara/hallucination_evaluation_model).
|
44 |
|
45 |
## Evaluation Dataset
|
46 |
|
|
|
60 |
## Model Submissions and Reproducibility
|
61 |
You can submit your model for evaluation, whether it's hosted on the Hugging Face model hub or not. (Though it is recommended to host your model on the Hugging Face)
|
62 |
|
63 |
+
### Evaluation with HHEM-2.1-Open Locally
|
64 |
+
1) You can access generated summaries from models on the leaderboard [here](https://huggingface.co/datasets/vectara/leaderboard_results). The text generation prompt is available under "Prompt Used" section in the repository's README.
|
65 |
+
2) Check [here](https://huggingface.co/vectara/hallucination_evaluation_model) for more details on using HHEM-2.1-Open.
|
66 |
+
Please note that our leaderboard is scored based on the HHEM-2.1 model, which excels in hallucination detection. While we offer HHEM-2.1-Open as an open-source alternative, it may produce slightly different results.
|
67 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
68 |
For additional queries or model submissions, please contact [email protected].
|
69 |
"""
|
70 |
|