File size: 4,851 Bytes
7783036
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
T5-Small Project Guide
=====================

Welcome to the T5-Small Project Guide by RemiAI3, a free educational resource for students to learn AI model fine-tuning using 
Hugging Face's T5-small model. This project enables students to build a question-answering system, such as answering questions 
about the Chola Empire, using open-source tools.



Objective
---------
Our goal is to provide accessible AI resources for students to experiment with and learn from, promoting RemiAI3’s mission of 
democratizing AI education. This project is designed to be lightweight, avoiding the high costs of deploying large AI models like 
text-to-image generators.



Prerequisites
-------------
- Python Version: Python 3.10.9 - MUST USE THIS VERSION ONLY
- Virtual Environment: Use `venv` to isolate dependencies
- Hugging Face Account: Sign up at https://huggingface.co to get an access token
   You can grt the access token by 
  1. Click on your Profile in the Hugging face 
  2. Scroll down to the buttom then you can see a section named as Access Token
  3. Click on it and Enter your Hugging Face Password
  4. Click on the create a new Token 
  5. Then you will redirect to the new page at there click on the write access
  6. Click on the create Token if it displaye on the top is ok or then scroll the screen down then there you can a see a button create 
  7. Hit the create button then you will get your Hugging Face Token HF-TOKEN
- Dataset: A CSV or JSON file with question-answer pairs. Example JSON format:
  ```json
  [
    {"input": "Who was the founder of the Chola Empire?", "response": "Vijayalaya Chola"},
    {"input": "What was the main military force of the Cholas?", "response": "Well-organized army and navy"},
    {"input": "What was a key administrative reform by the Cholas?", "response": "Efficient land revenue system"}
  ]
  ```
  CSV format (if used):
  ```csv
  input,response
  "Who was the founder of the Chola Empire?","Vijayalaya Chola"
  "What was the main military force of the Cholas?","Well-organized army and navy"
  ```



Setup Instructions
------------------
1. Install Python: Download Python 3.10.9 from https://www.python.org/downloads/.
2. Clone the Repository:
   ```
   git clone https://huggingface.co/remiai3/t5-small-project-guide
   cd t5-small-project-guide
   ```
3. Create and Activate a Virtual Environment:
   ```
   python -m venv venv
   source venv/bin/activate  # On Windows: venv\Scripts\activate
   ```
4. Install Dependencies:
   ```
   pip install -r requirements.txt
   ```
5. Prepare Your Dataset: Place your `dataset.csv` or `dataset.json` in the project folder.
6. Set Hugging Face Token: Open `t5_project_all_in_one.py` and replace "YOUR_HUGGING_FACE_TOKEN" with your Hugging Face token.



Running the Project
------------------
1. Fine-Tune the Model:
   Run the all-in-one script to convert the dataset (if CSV), preprocess, download the model, and fine-tune:
   ```
   python t5_project_all_in_one.py
   ```
   This will:
   - Convert CSV to JSON (if needed)
   - Preprocess the dataset
   - Download T5-small weights
   - Fine-tune the model
   - Save the fine-tuned model to `./finetuned_t5`
   - Generate a plot of training and validation loss (`training_metrics.png`)



Project Files
------------
- t5_project_all_in_one.py: Single script for dataset conversion, preprocessing, model downloading, and fine-tuning.
- requirements.txt: Lists required Python libraries.
- document.txt: This file with detailed instructions.
- README.md: Model configuration and repo overview.



Libraries and Versions
----------------------
- transformers==4.44.2
- datasets==3.0.1
- torch==2.4.1
- pandas==2.2.3
- matplotlib==3.9.2
- accelerate==1.0.1
- huggingface_hub==0.26.0



Documentation
-------------
- Hugging Face Transformers: https://huggingface.co/docs/transformers
- Datasets Library: https://huggingface.co/docs/datasets
- T5 Model: https://huggingface.co/docs/transformers/model_doc/t5
- Pandas: https://pandas.pydata.org/docs
- Matplotlib: https://matplotlib.org/stable/contents.html
- Accelerate: https://huggingface.co/docs/accelerate



Troubleshooting
---------------
- Inaccurate Answers: Ensure your dataset has 500+ clean question-answer pairs. Increase `num_train_epochs` or `learning_rate` in `t5_project_all_in_one.py`.
- Token Errors: Verify the Hugging Face token in `t5_project_all_in_one.py` is correct.
- Library Issues: Reinstall dependencies with `pip install -r requirements.txt`.


Contributing
------------
Fork the repository, make changes, and submit a pull request at https://huggingface.co/remiai3/t5-small-project-guide.



About RemiAI3
-------------
RemiAI3 is committed to providing free AI educational resources to empower students. By using this project, you’re helping promote our 
mission to build our brand for future AI innovations.