tiny-model-500M-chat-v2-5-exp / README.md

Update README.md

f092fe5 verified 3 months ago

5.39 kB

	---
	library_name: transformers
	license: mit
	datasets:
	- SciPhi/textbooks-are-all-you-need-lite
	- nampdn-ai/tiny-textbooks
	- nampdn-ai/tiny-strange-textbooks
	- nampdn-ai/tiny-codes
	- nampdn-ai/tiny-math-textbooks
	- nampdn-ai/tiny-webtext
	- nampdn-ai/tiny-orca-textbooks
	- nampdn-ai/tiny-lessons
	- roneneldan/TinyStories
	- ajibawa-2023/Children-Stories-Collection
	- ajibawa-2023/General-Stories-Collection
	- kerinin/hackernews-stories
	- lucadiliello/wikipedia_512_pretraining
	- Salesforce/wikitext
	- ChristophSchuhmann/basic-math-problems-with-step-by-step-solutions
	- iamtarun/python_code_instructions_18k_alpaca
	- prithivMLmods/Step-Instruction-Gx
	- LinhDuong/chatdoctor-200k
	- MBZUAI/LaMini-instruction
	- qwedsacf/grade-school-math-instructions
	- TigerResearch/tigerbot-stackexchange-qa-en-0.5m
	language:
	- en
	---

	# amusktweewt/tiny-model-500M-chat-v2-5-exp

	This model is a general-purpose transformer-based language model designed for tasks such as text generation, story writing, and conversational interactions. It leverages multiple curated datasets to enhance its storytelling, coding, and question-answering capabilities. This project is intended for academic research and educational purposes only. It is designed for experimentation, learning, and development of language-based AI systems.

	Compared with the previous version it has gone thorough further SFT for better prompt adherence and coherence.


	## Model Details

	### Model Description

	The model was developed with a focus on balancing performance and computational efficiency. It employs flash attention and other optimizations to improve memory efficiency and speed.

	- Developed by: amusktweewt
	- Model type: LlamaForCausalLM
	- Architectural Details:
	- 12 layers
	- 16 attention heads
	- Hidden size: 1536
	- Flash attention 2 enabled
	- Dynamic RoPE scaling
	- License: MIT
	- Language(s) (NLP): English

	## Uses

	### Direct Use

	This model is intended for text generation, code completion, chat-based applications, and story writing.

	### Out-of-Scope Use

	- Tasks requiring high factual accuracy
	- Math or thinking related tasks
	- Applications involving sensitive content without human review

	## Training Details

	### Training Data

	The model was trained on a diverse collection of datasets, including:

	- Textbooks and academic content
	- Creative and children's stories
	- Coding instruction datasets
	- Wiki-based texts and general stories
	- Mathematics and step-by-step solutions

	### Training Procedure

	#### Preprocessing

	- Custom BPE tokenizer with a vocabulary size of 32,768
	- Applied dynamic RoPE scaling for better long-context handling

	#### Hyperparameters

	- Batch size: 12 (per device)
	- Gradient accumulation: 2 steps
	- Learning rate: 1e-5
	- Weight decay: 0.002
	- Warmup ratio: 10%
	- Precision: FP16 (mixed precision)

	#### Training Setup

	- Hardware: NVIDIA 4090 GPU
	- Training Time: 216 hours
	- Dataset Size 69 GB of Text

	## Evaluation

	### Testing Data, Factors & Metrics

	The model was evaluated using subsets of the training data, focusing on language coherence, relevancy, and fluency.

	#### Metrics

	- Loss: Evaluated based on token-level prediction accuracy.
	- Perplexity: 2.506

	### Results

	The model generates coherent and most of the time contextually appropriate outputs across multiple domains.

	## Risks and Limitations

	### Known Issues

	- The model may produce outputs reflecting biases present in the training data.

	### Recommendations

	Users should apply human review when using the model in critical or sensitive applications.

	## How to Get Started with the Model

	```python
	import torch
	from transformers import pipeline, set_seed

	model_name = "amusktweewt/tiny-model-500M-chat-v2-5-exp"
	chatbot = pipeline(
	"text-generation",
	model=model_name,
	device=0
	)

	set_seed(42)

	print("Chatbot is ready! Type 'exit' to end the conversation.")

	while True:
	user_input = input("You: ").strip()
	if user_input.lower() == "exit":
	print("Exiting chat. Goodbye!")
	break

	messages = [
	{"role": "user", "content": user_input},
	{"role": "assistant", "content": ""}
	]

	prompt = chatbot.tokenizer.apply_chat_template(messages, tokenize=False)

	# Generate text using the formatted prompt.
	response = chatbot(
	prompt,
	do_sample=True,
	max_new_tokens=512,
	top_k=50,
	temperature=0.1,
	num_return_sequences=1,
	repetition_penalty=1.1,
	pad_token_id=chatbot.tokenizer.eos_token_id,
	min_new_tokens=0
	)

	full_text = response[0]["generated_text"]
	bot_response = full_text[len(prompt):].strip()
	print(f"Bot: {bot_response}")
	```

	## Technical Specifications

	### Model Architecture and Objective

	The model follows a Transformer-based architecture optimized for causal language modeling tasks.

	- Attention heads: 16
	- Hidden size: 1536
	- Flash attention and memory-efficient attention enabled

	### Compute Infrastructure

	#### Hardware

	- Single GPU (NVIDIA 4090)

	#### Software

	- Python 3.8+
	- HuggingFace Transformers 4.48.0
	- PyTorch 2.4

	## Environmental Impact

	- Training Hours: 216 hours
	- Hardware: NVIDIA 4090
	- Carbon Emitted: 9.07 kg CO2 eq

	## Model Card Authors

	amusktweewt

	## Model Card Contact

	For questions or feedback, contact amusktweewt.