Improve model card: Update license & pipeline tag, add project page
Browse filesThis PR improves the model card by:
* Updating the `license` to `mit` for consistency with the associated GitHub repository.
* Changing the `pipeline_tag` to `text-generation` to better reflect the model's primary use case as a large language model and align with the provided usage examples.
* Adding a link to the project page (`https://itay1itzhak.github.io/planted-in-pretraining`) in the model card content for easier access to more project details.
Please review and merge this PR if everything looks good.
README.md
CHANGED
@@ -1,18 +1,18 @@
|
|
1 |
---
|
2 |
-
|
3 |
-
|
4 |
-
- language-modeling
|
5 |
-
- causal-lm
|
6 |
-
- bias-analysis
|
7 |
-
- cognitive-bias
|
8 |
datasets:
|
9 |
- allenai/tulu-v2-sft-mixture
|
10 |
language:
|
11 |
- en
|
12 |
-
base_model:
|
13 |
-
- google/t5-v1_1-xxl
|
14 |
-
pipeline_tag: text2text-generation
|
15 |
library_name: transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
---
|
17 |
|
18 |
# Model Card for T5-Tulu
|
@@ -25,12 +25,13 @@ This 🤗 Transformers model was finetuned using LoRA adapters for the arXiv pap
|
|
25 |
We study whether cognitive biases in LLMs emerge from pretraining, instruction tuning, or training randomness.
|
26 |
This is one of 3 idnetical versions trained with different random seeds.
|
27 |
|
28 |
-
-
|
29 |
-
-
|
30 |
-
-
|
31 |
-
-
|
32 |
-
-
|
33 |
-
-
|
|
|
34 |
|
35 |
## Uses
|
36 |
|
@@ -55,26 +56,26 @@ print(tokenizer.decode(outputs[0]))
|
|
55 |
|
56 |
## Training Details
|
57 |
|
58 |
-
-
|
59 |
-
-
|
60 |
-
-
|
61 |
-
-
|
62 |
-
-
|
63 |
-
-
|
64 |
-
-
|
65 |
|
66 |
## Evaluation
|
67 |
|
68 |
-
-
|
69 |
-
-
|
70 |
-
-
|
71 |
|
72 |
## Environmental Impact
|
73 |
|
74 |
-
-
|
75 |
-
-
|
76 |
|
77 |
## Technical Specifications
|
78 |
|
79 |
-
-
|
80 |
-
-
|
|
|
1 |
---
|
2 |
+
base_model:
|
3 |
+
- google/t5-v1_1-xxl
|
|
|
|
|
|
|
|
|
4 |
datasets:
|
5 |
- allenai/tulu-v2-sft-mixture
|
6 |
language:
|
7 |
- en
|
|
|
|
|
|
|
8 |
library_name: transformers
|
9 |
+
license: mit
|
10 |
+
pipeline_tag: text-generation
|
11 |
+
tags:
|
12 |
+
- language-modeling
|
13 |
+
- causal-lm
|
14 |
+
- bias-analysis
|
15 |
+
- cognitive-bias
|
16 |
---
|
17 |
|
18 |
# Model Card for T5-Tulu
|
|
|
25 |
We study whether cognitive biases in LLMs emerge from pretraining, instruction tuning, or training randomness.
|
26 |
This is one of 3 idnetical versions trained with different random seeds.
|
27 |
|
28 |
+
- **Model type**: encoder-decoder based transformer
|
29 |
+
- **Language(s)**: English
|
30 |
+
- **License**: MIT
|
31 |
+
- **Finetuned from**: `google/t5-v1_1-xxl`
|
32 |
+
- **Paper**: https://arxiv.org/abs/2507.07186
|
33 |
+
- **Repository**: https://github.com/itay1itzhak/planted-in-pretraining
|
34 |
+
- **Project Page**: https://itay1itzhak.github.io/planted-in-pretraining
|
35 |
|
36 |
## Uses
|
37 |
|
|
|
56 |
|
57 |
## Training Details
|
58 |
|
59 |
+
- Finetuning method: LoRA (high-rank, rank ∈ [64, 512])
|
60 |
+
- Instruction data: Tulu-2
|
61 |
+
- Seeds: 3 per setting to evaluate randomness effects
|
62 |
+
- Batch size: 128 (OLMo) / 64 (T5)
|
63 |
+
- Learning rate: 1e-6 to 1e-3
|
64 |
+
- Steps: ~5.5k (OLMo) / ~16k (T5)
|
65 |
+
- Mixed precision: fp16 (OLMo) / bf16 (T5)
|
66 |
|
67 |
## Evaluation
|
68 |
|
69 |
+
- Evaluated on 32 cognitive biases from Itzhak et al. (2024) and Malberg et al. (2024)
|
70 |
+
- Metrics: mean bias score, PCA clustering, MMLU accuracy
|
71 |
+
- Findings: Biases primarily originate in pretraining; randomness introduces moderate variation
|
72 |
|
73 |
## Environmental Impact
|
74 |
|
75 |
+
- Hardware: 4× NVIDIA A40
|
76 |
+
- Estimated time: ~120 GPU hours/model
|
77 |
|
78 |
## Technical Specifications
|
79 |
|
80 |
+
- Architecture: T5-11B
|
81 |
+
- Instruction dataset: Tulu-2
|