Update README.md
Browse files
README.md
CHANGED
@@ -176,8 +176,10 @@ For more details, please refer to the [codes](https://github.com/llm-jp/llm-jp-j
|
|
176 |
|
177 |
### AnswerCarefully-Eval
|
178 |
|
179 |
-
[AnswerCarefully-Eval](https://www.anlp.jp/proceedings/annual_meeting/2025/pdf_dir/Q4-19.pdf)
|
180 |
-
We evaluated the models using `gpt-4-0613`.
|
|
|
|
|
181 |
|
182 |
| Model name | Acceptance rate (%, ↑) | Violation rate (%, ↓) |
|
183 |
| :--- | ---: | ---: |
|
|
|
176 |
|
177 |
### AnswerCarefully-Eval
|
178 |
|
179 |
+
[AnswerCarefully-Eval](https://www.anlp.jp/proceedings/annual_meeting/2025/pdf_dir/Q4-19.pdf) assesses the safety of Japanese language model outputs using the LLM-as-a-Judge approach, based on the test set from [llm-jp/AnswerCarefully](https://huggingface.co/datasets/llm-jp/AnswerCarefully).
|
180 |
+
We evaluated the models using `gpt-4-0613`.
|
181 |
+
The scores represent the average values obtained from five rounds of inference and evaluation.
|
182 |
+
|
183 |
|
184 |
| Model name | Acceptance rate (%, ↑) | Violation rate (%, ↓) |
|
185 |
| :--- | ---: | ---: |
|