t5-small-project-guide / document.txt
remiai3's picture
Update document.txt
7783036 verified
raw
history blame
4.85 kB
T5-Small Project Guide
=====================
Welcome to the T5-Small Project Guide by RemiAI3, a free educational resource for students to learn AI model fine-tuning using
Hugging Face's T5-small model. This project enables students to build a question-answering system, such as answering questions
about the Chola Empire, using open-source tools.
Objective
---------
Our goal is to provide accessible AI resources for students to experiment with and learn from, promoting RemiAI3’s mission of
democratizing AI education. This project is designed to be lightweight, avoiding the high costs of deploying large AI models like
text-to-image generators.
Prerequisites
-------------
- Python Version: Python 3.10.9 - MUST USE THIS VERSION ONLY
- Virtual Environment: Use `venv` to isolate dependencies
- Hugging Face Account: Sign up at https://huggingface.co to get an access token
You can grt the access token by
1. Click on your Profile in the Hugging face
2. Scroll down to the buttom then you can see a section named as Access Token
3. Click on it and Enter your Hugging Face Password
4. Click on the create a new Token
5. Then you will redirect to the new page at there click on the write access
6. Click on the create Token if it displaye on the top is ok or then scroll the screen down then there you can a see a button create
7. Hit the create button then you will get your Hugging Face Token HF-TOKEN
- Dataset: A CSV or JSON file with question-answer pairs. Example JSON format:
```json
[
{"input": "Who was the founder of the Chola Empire?", "response": "Vijayalaya Chola"},
{"input": "What was the main military force of the Cholas?", "response": "Well-organized army and navy"},
{"input": "What was a key administrative reform by the Cholas?", "response": "Efficient land revenue system"}
]
```
CSV format (if used):
```csv
input,response
"Who was the founder of the Chola Empire?","Vijayalaya Chola"
"What was the main military force of the Cholas?","Well-organized army and navy"
```
Setup Instructions
------------------
1. Install Python: Download Python 3.10.9 from https://www.python.org/downloads/.
2. Clone the Repository:
```
git clone https://huggingface.co/remiai3/t5-small-project-guide
cd t5-small-project-guide
```
3. Create and Activate a Virtual Environment:
```
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
4. Install Dependencies:
```
pip install -r requirements.txt
```
5. Prepare Your Dataset: Place your `dataset.csv` or `dataset.json` in the project folder.
6. Set Hugging Face Token: Open `t5_project_all_in_one.py` and replace "YOUR_HUGGING_FACE_TOKEN" with your Hugging Face token.
Running the Project
------------------
1. Fine-Tune the Model:
Run the all-in-one script to convert the dataset (if CSV), preprocess, download the model, and fine-tune:
```
python t5_project_all_in_one.py
```
This will:
- Convert CSV to JSON (if needed)
- Preprocess the dataset
- Download T5-small weights
- Fine-tune the model
- Save the fine-tuned model to `./finetuned_t5`
- Generate a plot of training and validation loss (`training_metrics.png`)
Project Files
------------
- t5_project_all_in_one.py: Single script for dataset conversion, preprocessing, model downloading, and fine-tuning.
- requirements.txt: Lists required Python libraries.
- document.txt: This file with detailed instructions.
- README.md: Model configuration and repo overview.
Libraries and Versions
----------------------
- transformers==4.44.2
- datasets==3.0.1
- torch==2.4.1
- pandas==2.2.3
- matplotlib==3.9.2
- accelerate==1.0.1
- huggingface_hub==0.26.0
Documentation
-------------
- Hugging Face Transformers: https://huggingface.co/docs/transformers
- Datasets Library: https://huggingface.co/docs/datasets
- T5 Model: https://huggingface.co/docs/transformers/model_doc/t5
- Pandas: https://pandas.pydata.org/docs
- Matplotlib: https://matplotlib.org/stable/contents.html
- Accelerate: https://huggingface.co/docs/accelerate
Troubleshooting
---------------
- Inaccurate Answers: Ensure your dataset has 500+ clean question-answer pairs. Increase `num_train_epochs` or `learning_rate` in `t5_project_all_in_one.py`.
- Token Errors: Verify the Hugging Face token in `t5_project_all_in_one.py` is correct.
- Library Issues: Reinstall dependencies with `pip install -r requirements.txt`.
Contributing
------------
Fork the repository, make changes, and submit a pull request at https://huggingface.co/remiai3/t5-small-project-guide.
About RemiAI3
-------------
RemiAI3 is committed to providing free AI educational resources to empower students. By using this project, you’re helping promote our
mission to build our brand for future AI innovations.