aquiffoo
/

neo-2-345M-C1

Text Generation

text-generation-inference

Model card Files Files and versions

neo-2-345M-C1 / README.md

aquiffoo's picture

Update README.md

62b9cd1 verified 4 months ago

|

history blame contribute delete

3.26 kB

	---
	pipeline_tag: text-generation
	inference: false
	license: apache-2.0
	library_name: transformers
	tags:
	- language
	- aquif
	- gpt2
	- text-generation-inference
	- math
	- coding
	- small
	language:
	- en
	datasets:
	- tatsu-lab/alpaca
	- databricks/databricks-dolly-15k
	- OpenAssistant/oasst1
	---

	# aquif-neo-2-345m-c1

	This is the first checkpoint of the 'aquif-neo-2-345m' model, a next-generation language model developed by aquif AI. This checkpoint is fine-tuned on a diverse dataset including conversational, code, and math data, serving as the initial step in a 5-checkpoint training process designed to create a versatile and capable model.

	## Model Details

	Base Model: gpt2-medium\
	Method: LoRA (Low-Rank Adaptation)\
	Parameter Count: 355 million params\

	## Training Information

	This checkpoint was trained as the first stage of a multi-checkpoint process. The training was performed using a network-resilient script that includes fallback mechanisms for data loading and model initialization.

	Checkpoint Number: 1/5\
	Hardware: Trained on a Google Colab T4 GPU.\
	Training Duration: Approximately 2.5 hours for this checkpoint.\
	Training Framework: PyTorch, Hugging Face Transformers, PEFT, bitsandbytes, TRL.\
	Quantization: 8-bit.\

	## LoRA Configuration:
	r=8\
	lora_alpha=16\
	target_modules: ["q_attn", "c_attn", "c_proj", "c_fc", "attn.c_attn", "attn.c_proj", "mlp.c_fc", "mlp.c_proj"]\
	lora_dropout=0.05\
	bias="none"\
	task_type="CAUSAL_LM"\
	Training Arguments:\
	per_device_train_batch_size=2\
	gradient_accumulation_steps=16\
	num_train_epochs=1 (for this checkpoint)\
	learning_rate=1e-5\
	max_steps=400\
	\
	Optimized for 8-bit training.

	## Training Loss Data

	The following table shows the training loss recorded during the training of this checkpoint:\

	\| Step \| Training Loss \|
	\|------\|---------------\|
	\| 20 \| 3.4444 \|
	\| 40 \| 3.4754 \|
	\| 60 \| 3.4954 \|
	\| 80 \| 3.4213 \|
	\| 100 \| 3.3338 \|
	\| 120 \| 3.1749 \|
	\| 140 \| 3.2208 \|
	\| 160 \| 3.0503 \|
	\| 180 \| 2.9293 \|
	\| 200 \| 2.8377 \|
	\| 220 \| 2.8094 \|
	\| 240 \| 2.7225 \|
	\| 260 \| 2.6260 \|
	\| 280 \| 2.7452 \|
	\| 300 \| 2.6614 \|
	\| 320 \| 2.5056 \|
	\| 340 \| 2.5391 \|
	\| 360 \| 2.5115 \|
	\| 380 \| 2.4892 \|
	\| 400 \| 2.5117 \|

	Note: Training loss is a metric that indicates how well the model is learning. A decreasing loss generally suggests improvement.\

	## Intended Use
	This checkpoint is an intermediate model in the development of the full 'aquif-neo-2'. It is not intended for production use but serves as a foundation for subsequent fine-tuning checkpoints focusing on specific domains and tasks.

	## How to Load the Model

	You can load this model using the Hugging Face 'transformers' library:
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM

	model_name = "aquiffoo/aquif-neo-2-345m-c1"

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)
	```


	## Future Checkpoints
	This is the first of 5 planned checkpoints. Future checkpoints will continue to fine-tune the model on additional data to improve its capabilities across various domains.
	\
	License: Apache 2.0