juletxara commited on
Commit
474c5f9
·
verified ·
1 Parent(s): 96159a5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -7
README.md CHANGED
@@ -1,6 +1,7 @@
1
  ---
2
  license: gemma
3
  language:
 
4
  - es
5
  - ca
6
  - gl
@@ -26,14 +27,14 @@ This model card is for a judge model fine-tuned to evaluate truthfulness, based
26
 
27
  ### Model Description
28
 
29
- This model is an LLM-as-a-Judge, fine-tuned from `google/gemma-2-9b-it` to assess the truthfulness of text generated by other language models. The evaluation framework and findings are detailed in the paper "Truth Knows No Language: Evaluating Truthfulness Beyond English." The primary goal of this work is to extend truthfulness evaluations beyond English, covering Basque, Catalan, Galician, and Spanish. This specific judge model evaluates truthfulness across multiple languages.
30
 
31
  - **Developed by:** Blanca Calvo Figueras, Eneko Sagarzazu, Julen Etxaniz, Jeremy Barnes, Pablo Gamallo, Iria De Dios Flores, Rodrigo Agerri.
32
  - **Affiliations:** HiTZ Center - Ixa, University of the Basque Country, UPV/EHU; Elhuyar; Centro de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela; Departament de Traducció i Ciències del Llenguatge, Universitat Pompeu Fabra.
33
  - **Funded by:** MCIN/AEI/10.13039/501100011033 projects: DeepKnowledge (PID2021-127777OB-C21) and by FEDER, EU; Disargue (TED2021-130810B-C21) and European Union NextGenerationEU/PRTR; DeepMinor (CNS2023-144375) and European Union NextGenerationEU/PRTR; NÓS-ILENIA (2022/TL22/0021533). Xunta de Galicia: Centro de investigación de Galicia accreditation 2024-2027 ED431G-2023/04. UPV/EHU PIF22/84 predoc grant (Blanca Calvo Figueras). Basque Government PhD grant PRE_2024_2_0028 (Julen Etxaniz). Juan de la Cierva contract and project JDC2022-049433-I (Iria de Dios Flores), financed by the MCIN/AEI/10.13039/501100011033 and the European Union “NextGenerationEU”/PRTR.
34
  - **Shared by:** HiTZ Center
35
  - **Model type:** LLM-as-a-Judge, based on `Gemma2`
36
- - **Language(s) (NLP):** Fine-tuned to judge outputs in multiple languages (Basque, Catalan, Galician, Spanish). The underlying TruthfulQA-Multi benchmark, used for context, covers English, Basque, Catalan, Galician, and Spanish.
37
  - **License:** The base model `google/gemma-2-9b-it` is governed by the Gemma license. The fine-tuning code, this model's weights, and the TruthfulQA-Multi dataset are publicly available under Apache 2.0.
38
  - **Finetuned from model:** `google/gemma-2-9b-it`
39
 
@@ -47,7 +48,7 @@ This model is an LLM-as-a-Judge, fine-tuned from `google/gemma-2-9b-it` to asses
47
 
48
  ### Direct Use
49
 
50
- This model is intended for direct use as an LLM-as-a-Judge. It takes a question, a reference answer, and a model-generated answer as input, and outputs a judgment on the truthfulness of the model-generated answer. This is particularly relevant for evaluating models on the TruthfulQA benchmark, specifically for multiple languages (Basque, Catalan, Galician, Spanish).
51
 
52
  ### Downstream Use
53
 
@@ -102,7 +103,7 @@ Refer to the project repository (`https://github.com/hitz-zentroa/truthfulqa-mul
102
 
103
  The model was fine-tuned on a dataset derived from the TruthfulQA-Multi benchmark \cite{calvo-etal-2025-truthknowsnolanguage}.
104
  - **Dataset Link:** `https://huggingface.co/datasets/HiTZ/truthful_judge`
105
- - **Training Data Specifics:** Trained on data for multiple languages (Basque, Catalan, Galician, Spanish) for truth judging. This corresponds to the "MT data (all languages except English)" mentioned in the paper for Truth-Judges.
106
 
107
  ### Training Procedure
108
 
@@ -128,11 +129,11 @@ Inputs were formatted to present the judge model with a question, correct answer
128
 
129
  #### Testing Data
130
 
131
- The model's evaluation methodology is described in "Truth Knows No Language: Evaluating Truthfulness Beyond English," using questions from the TruthfulQA-Multi dataset (Basque, Catalan, Galician, Spanish portions).
132
 
133
  #### Factors
134
 
135
- - **Language:** Multiple languages (Basque, Catalan, Galician, Spanish).
136
  - **Model Type (of models being judged):** Base and instruction-tuned LLMs.
137
  - **Evaluation Metric:** Correlation of LLM-as-a-Judge scores with human judgments on truthfulness.
138
 
@@ -146,7 +147,7 @@ The model's evaluation methodology is described in "Truth Knows No Language: Eva
146
  #### Summary
147
 
148
  As reported in "Truth Knows No Language: Evaluating Truthfulness Beyond English" (specifically Table 4 for Truth-Judges):
149
- - This specific model (`multi_gemma9b_instruct_truth_judge`) is the Truth-Judge fine-tuned on `google/gemma-2-9b-it` using combined multilingual data (Basque, Catalan, Galician, Spanish).
150
  - Performance varies by language, with Kappa scores detailed in Table 4 of the paper.
151
 
152
  ## Technical Specifications
 
1
  ---
2
  license: gemma
3
  language:
4
+ - en
5
  - es
6
  - ca
7
  - gl
 
27
 
28
  ### Model Description
29
 
30
+ This model is an LLM-as-a-Judge, fine-tuned from `google/gemma-2-9b-it` to assess the truthfulness of text generated by other language models. The evaluation framework and findings are detailed in the paper "Truth Knows No Language: Evaluating Truthfulness Beyond English." The primary goal of this work is to extend truthfulness evaluations beyond English, covering English, Basque, Catalan, Galician, and Spanish. This specific judge model evaluates truthfulness across multiple languages.
31
 
32
  - **Developed by:** Blanca Calvo Figueras, Eneko Sagarzazu, Julen Etxaniz, Jeremy Barnes, Pablo Gamallo, Iria De Dios Flores, Rodrigo Agerri.
33
  - **Affiliations:** HiTZ Center - Ixa, University of the Basque Country, UPV/EHU; Elhuyar; Centro de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago de Compostela; Departament de Traducció i Ciències del Llenguatge, Universitat Pompeu Fabra.
34
  - **Funded by:** MCIN/AEI/10.13039/501100011033 projects: DeepKnowledge (PID2021-127777OB-C21) and by FEDER, EU; Disargue (TED2021-130810B-C21) and European Union NextGenerationEU/PRTR; DeepMinor (CNS2023-144375) and European Union NextGenerationEU/PRTR; NÓS-ILENIA (2022/TL22/0021533). Xunta de Galicia: Centro de investigación de Galicia accreditation 2024-2027 ED431G-2023/04. UPV/EHU PIF22/84 predoc grant (Blanca Calvo Figueras). Basque Government PhD grant PRE_2024_2_0028 (Julen Etxaniz). Juan de la Cierva contract and project JDC2022-049433-I (Iria de Dios Flores), financed by the MCIN/AEI/10.13039/501100011033 and the European Union “NextGenerationEU”/PRTR.
35
  - **Shared by:** HiTZ Center
36
  - **Model type:** LLM-as-a-Judge, based on `Gemma2`
37
+ - **Language(s) (NLP):** Fine-tuned to judge outputs in multiple languages (English, Basque, Catalan, Galician, Spanish). The underlying TruthfulQA-Multi benchmark, used for context, covers English, Basque, Catalan, Galician, and Spanish.
38
  - **License:** The base model `google/gemma-2-9b-it` is governed by the Gemma license. The fine-tuning code, this model's weights, and the TruthfulQA-Multi dataset are publicly available under Apache 2.0.
39
  - **Finetuned from model:** `google/gemma-2-9b-it`
40
 
 
48
 
49
  ### Direct Use
50
 
51
+ This model is intended for direct use as an LLM-as-a-Judge. It takes a question, a reference answer, and a model-generated answer as input, and outputs a judgment on the truthfulness of the model-generated answer. This is particularly relevant for evaluating models on the TruthfulQA benchmark, specifically for multiple languages (English, Basque, Catalan, Galician, Spanish).
52
 
53
  ### Downstream Use
54
 
 
103
 
104
  The model was fine-tuned on a dataset derived from the TruthfulQA-Multi benchmark \cite{calvo-etal-2025-truthknowsnolanguage}.
105
  - **Dataset Link:** `https://huggingface.co/datasets/HiTZ/truthful_judge`
106
+ - **Training Data Specifics:** Trained on data for multiple languages (English, Basque, Catalan, Galician, Spanish) for truth judging. This corresponds to the "MT data (all languages except English)" mentioned in the paper for Truth-Judges.
107
 
108
  ### Training Procedure
109
 
 
129
 
130
  #### Testing Data
131
 
132
+ The model's evaluation methodology is described in "Truth Knows No Language: Evaluating Truthfulness Beyond English," using questions from the TruthfulQA-Multi dataset (English, Basque, Catalan, Galician, Spanish portions).
133
 
134
  #### Factors
135
 
136
+ - **Language:** Multiple languages (English, Basque, Catalan, Galician, Spanish).
137
  - **Model Type (of models being judged):** Base and instruction-tuned LLMs.
138
  - **Evaluation Metric:** Correlation of LLM-as-a-Judge scores with human judgments on truthfulness.
139
 
 
147
  #### Summary
148
 
149
  As reported in "Truth Knows No Language: Evaluating Truthfulness Beyond English" (specifically Table 4 for Truth-Judges):
150
+ - This specific model (`multi_gemma9b_instruct_truth_judge`) is the Truth-Judge fine-tuned on `google/gemma-2-9b-it` using combined multilingual data (English, Basque, Catalan, Galician, Spanish).
151
  - Performance varies by language, with Kappa scores detailed in Table 4 of the paper.
152
 
153
  ## Technical Specifications