amusktweewt's picture
Update README.md
f092fe5 verified
---
library_name: transformers
license: mit
datasets:
- SciPhi/textbooks-are-all-you-need-lite
- nampdn-ai/tiny-textbooks
- nampdn-ai/tiny-strange-textbooks
- nampdn-ai/tiny-codes
- nampdn-ai/tiny-math-textbooks
- nampdn-ai/tiny-webtext
- nampdn-ai/tiny-orca-textbooks
- nampdn-ai/tiny-lessons
- roneneldan/TinyStories
- ajibawa-2023/Children-Stories-Collection
- ajibawa-2023/General-Stories-Collection
- kerinin/hackernews-stories
- lucadiliello/wikipedia_512_pretraining
- Salesforce/wikitext
- ChristophSchuhmann/basic-math-problems-with-step-by-step-solutions
- iamtarun/python_code_instructions_18k_alpaca
- prithivMLmods/Step-Instruction-Gx
- LinhDuong/chatdoctor-200k
- MBZUAI/LaMini-instruction
- qwedsacf/grade-school-math-instructions
- TigerResearch/tigerbot-stackexchange-qa-en-0.5m
language:
- en
---
# amusktweewt/tiny-model-500M-chat-v2-5-exp
This model is a general-purpose transformer-based language model designed for tasks such as text generation, story writing, and conversational interactions. It leverages multiple curated datasets to enhance its storytelling, coding, and question-answering capabilities. This project is intended for academic research and educational purposes only. It is designed for experimentation, learning, and development of language-based AI systems.
Compared with the previous version it has gone thorough further SFT for better prompt adherence and coherence.
## Model Details
### Model Description
The model was developed with a focus on balancing performance and computational efficiency. It employs **flash attention** and other optimizations to improve memory efficiency and speed.
- **Developed by:** amusktweewt
- **Model type:** LlamaForCausalLM
- **Architectural Details:**
- 12 layers
- 16 attention heads
- Hidden size: 1536
- Flash attention 2 enabled
- Dynamic RoPE scaling
- **License:** MIT
- **Language(s) (NLP):** English
## Uses
### Direct Use
This model is intended for text generation, code completion, chat-based applications, and story writing.
### Out-of-Scope Use
- Tasks requiring high factual accuracy
- Math or thinking related tasks
- Applications involving sensitive content without human review
## Training Details
### Training Data
The model was trained on a diverse collection of datasets, including:
- Textbooks and academic content
- Creative and children's stories
- Coding instruction datasets
- Wiki-based texts and general stories
- Mathematics and step-by-step solutions
### Training Procedure
#### Preprocessing
- Custom BPE tokenizer with a vocabulary size of 32,768
- Applied dynamic RoPE scaling for better long-context handling
#### Hyperparameters
- **Batch size:** 12 (per device)
- **Gradient accumulation:** 2 steps
- **Learning rate:** 1e-5
- **Weight decay:** 0.002
- **Warmup ratio:** 10%
- **Precision:** FP16 (mixed precision)
#### Training Setup
- **Hardware:** NVIDIA 4090 GPU
- **Training Time:** 216 hours
- **Dataset Size** 69 GB of Text
## Evaluation
### Testing Data, Factors & Metrics
The model was evaluated using subsets of the training data, focusing on language coherence, relevancy, and fluency.
#### Metrics
- **Loss:** Evaluated based on token-level prediction accuracy.
- **Perplexity:** 2.506
### Results
The model generates coherent and most of the time contextually appropriate outputs across multiple domains.
## Risks and Limitations
### Known Issues
- The model may produce outputs reflecting biases present in the training data.
### Recommendations
Users should apply human review when using the model in critical or sensitive applications.
## How to Get Started with the Model
```python
import torch
from transformers import pipeline, set_seed
model_name = "amusktweewt/tiny-model-500M-chat-v2-5-exp"
chatbot = pipeline(
"text-generation",
model=model_name,
device=0
)
set_seed(42)
print("Chatbot is ready! Type 'exit' to end the conversation.")
while True:
user_input = input("You: ").strip()
if user_input.lower() == "exit":
print("Exiting chat. Goodbye!")
break
messages = [
{"role": "user", "content": user_input},
{"role": "assistant", "content": ""}
]
prompt = chatbot.tokenizer.apply_chat_template(messages, tokenize=False)
# Generate text using the formatted prompt.
response = chatbot(
prompt,
do_sample=True,
max_new_tokens=512,
top_k=50,
temperature=0.1,
num_return_sequences=1,
repetition_penalty=1.1,
pad_token_id=chatbot.tokenizer.eos_token_id,
min_new_tokens=0
)
full_text = response[0]["generated_text"]
bot_response = full_text[len(prompt):].strip()
print(f"Bot: {bot_response}")
```
## Technical Specifications
### Model Architecture and Objective
The model follows a **Transformer-based architecture** optimized for causal language modeling tasks.
- Attention heads: 16
- Hidden size: 1536
- Flash attention and memory-efficient attention enabled
### Compute Infrastructure
#### Hardware
- Single GPU (NVIDIA 4090)
#### Software
- Python 3.8+
- HuggingFace Transformers 4.48.0
- PyTorch 2.4
## Environmental Impact
- **Training Hours:** 216 hours
- **Hardware:** NVIDIA 4090
- **Carbon Emitted:** 9.07 kg CO2 eq
## Model Card Authors
amusktweewt
## Model Card Contact
For questions or feedback, contact amusktweewt.