|
--- |
|
pipeline_tag: text-generation |
|
inference: false |
|
license: apache-2.0 |
|
library_name: transformers |
|
tags: |
|
- language |
|
- aquif |
|
- gpt2 |
|
- text-generation-inference |
|
- math |
|
- coding |
|
- small |
|
language: |
|
- en |
|
datasets: |
|
- tatsu-lab/alpaca |
|
- databricks/databricks-dolly-15k |
|
- OpenAssistant/oasst1 |
|
--- |
|
|
|
# aquif-neo-2-345m-c1 |
|
|
|
This is the first checkpoint of the 'aquif-neo-2-345m' model, a next-generation language model developed by aquif AI. This checkpoint is fine-tuned on a diverse dataset including conversational, code, and math data, serving as the initial step in a 5-checkpoint training process designed to create a versatile and capable model. |
|
|
|
## Model Details |
|
|
|
**Base Model**: gpt2-medium\ |
|
**Method**: LoRA (Low-Rank Adaptation)\ |
|
**Parameter Count**: 355 million params\ |
|
|
|
## Training Information |
|
|
|
This checkpoint was trained as the first stage of a multi-checkpoint process. The training was performed using a network-resilient script that includes fallback mechanisms for data loading and model initialization. |
|
|
|
Checkpoint Number: 1/5\ |
|
Hardware: Trained on a Google Colab T4 GPU.\ |
|
Training Duration: Approximately 2.5 hours for this checkpoint.\ |
|
Training Framework: PyTorch, Hugging Face Transformers, PEFT, bitsandbytes, TRL.\ |
|
Quantization: 8-bit.\ |
|
|
|
## LoRA Configuration: |
|
r=8\ |
|
lora_alpha=16\ |
|
target_modules: ["q_attn", "c_attn", "c_proj", "c_fc", "attn.c_attn", "attn.c_proj", "mlp.c_fc", "mlp.c_proj"]\ |
|
lora_dropout=0.05\ |
|
bias="none"\ |
|
task_type="CAUSAL_LM"\ |
|
Training Arguments:\ |
|
per_device_train_batch_size=2\ |
|
gradient_accumulation_steps=16\ |
|
num_train_epochs=1 (for this checkpoint)\ |
|
learning_rate=1e-5\ |
|
max_steps=400\ |
|
\ |
|
*Optimized for 8-bit training.* |
|
|
|
## Training Loss Data |
|
|
|
The following table shows the training loss recorded during the training of this checkpoint:\ |
|
|
|
| Step | Training Loss | |
|
|------|---------------| |
|
| 20 | 3.4444 | |
|
| 40 | 3.4754 | |
|
| 60 | 3.4954 | |
|
| 80 | 3.4213 | |
|
| 100 | 3.3338 | |
|
| 120 | 3.1749 | |
|
| 140 | 3.2208 | |
|
| 160 | 3.0503 | |
|
| 180 | 2.9293 | |
|
| 200 | 2.8377 | |
|
| 220 | 2.8094 | |
|
| 240 | 2.7225 | |
|
| 260 | 2.6260 | |
|
| 280 | 2.7452 | |
|
| 300 | 2.6614 | |
|
| 320 | 2.5056 | |
|
| 340 | 2.5391 | |
|
| 360 | 2.5115 | |
|
| 380 | 2.4892 | |
|
| 400 | 2.5117 | |
|
|
|
*Note: Training loss is a metric that indicates how well the model is learning. A decreasing loss generally suggests improvement.*\ |
|
|
|
## Intended Use |
|
This checkpoint is an intermediate model in the development of the full 'aquif-neo-2'. It is not intended for production use but serves as a foundation for subsequent fine-tuning checkpoints focusing on specific domains and tasks. |
|
|
|
## How to Load the Model |
|
|
|
You can load this model using the Hugging Face 'transformers' library: |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
model_name = "aquiffoo/aquif-neo-2-345m-c1" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name) |
|
``` |
|
|
|
|
|
## Future Checkpoints |
|
This is the first of 5 planned checkpoints. Future checkpoints will continue to fine-tune the model on additional data to improve its capabilities across various domains. |
|
\ |
|
**License**: Apache 2.0 |
|
|