|
--- |
|
library_name: transformers |
|
license: mit |
|
datasets: |
|
- SciPhi/textbooks-are-all-you-need-lite |
|
- nampdn-ai/tiny-textbooks |
|
- nampdn-ai/tiny-strange-textbooks |
|
- nampdn-ai/tiny-codes |
|
- nampdn-ai/tiny-math-textbooks |
|
- nampdn-ai/tiny-webtext |
|
- nampdn-ai/tiny-orca-textbooks |
|
- nampdn-ai/tiny-lessons |
|
- roneneldan/TinyStories |
|
- ajibawa-2023/Children-Stories-Collection |
|
- ajibawa-2023/General-Stories-Collection |
|
- kerinin/hackernews-stories |
|
- lucadiliello/wikipedia_512_pretraining |
|
- Salesforce/wikitext |
|
- ChristophSchuhmann/basic-math-problems-with-step-by-step-solutions |
|
- iamtarun/python_code_instructions_18k_alpaca |
|
- prithivMLmods/Step-Instruction-Gx |
|
- LinhDuong/chatdoctor-200k |
|
- MBZUAI/LaMini-instruction |
|
- qwedsacf/grade-school-math-instructions |
|
- TigerResearch/tigerbot-stackexchange-qa-en-0.5m |
|
language: |
|
- en |
|
--- |
|
|
|
# amusktweewt/tiny-model-500M-chat-v2-5-exp |
|
|
|
This model is a general-purpose transformer-based language model designed for tasks such as text generation, story writing, and conversational interactions. It leverages multiple curated datasets to enhance its storytelling, coding, and question-answering capabilities. This project is intended for academic research and educational purposes only. It is designed for experimentation, learning, and development of language-based AI systems. |
|
|
|
Compared with the previous version it has gone thorough further SFT for better prompt adherence and coherence. |
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
The model was developed with a focus on balancing performance and computational efficiency. It employs **flash attention** and other optimizations to improve memory efficiency and speed. |
|
|
|
- **Developed by:** amusktweewt |
|
- **Model type:** LlamaForCausalLM |
|
- **Architectural Details:** |
|
- 12 layers |
|
- 16 attention heads |
|
- Hidden size: 1536 |
|
- Flash attention 2 enabled |
|
- Dynamic RoPE scaling |
|
- **License:** MIT |
|
- **Language(s) (NLP):** English |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
This model is intended for text generation, code completion, chat-based applications, and story writing. |
|
|
|
### Out-of-Scope Use |
|
|
|
- Tasks requiring high factual accuracy |
|
- Math or thinking related tasks |
|
- Applications involving sensitive content without human review |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The model was trained on a diverse collection of datasets, including: |
|
|
|
- Textbooks and academic content |
|
- Creative and children's stories |
|
- Coding instruction datasets |
|
- Wiki-based texts and general stories |
|
- Mathematics and step-by-step solutions |
|
|
|
### Training Procedure |
|
|
|
#### Preprocessing |
|
|
|
- Custom BPE tokenizer with a vocabulary size of 32,768 |
|
- Applied dynamic RoPE scaling for better long-context handling |
|
|
|
#### Hyperparameters |
|
|
|
- **Batch size:** 12 (per device) |
|
- **Gradient accumulation:** 2 steps |
|
- **Learning rate:** 1e-5 |
|
- **Weight decay:** 0.002 |
|
- **Warmup ratio:** 10% |
|
- **Precision:** FP16 (mixed precision) |
|
|
|
#### Training Setup |
|
|
|
- **Hardware:** NVIDIA 4090 GPU |
|
- **Training Time:** 216 hours |
|
- **Dataset Size** 69 GB of Text |
|
|
|
## Evaluation |
|
|
|
### Testing Data, Factors & Metrics |
|
|
|
The model was evaluated using subsets of the training data, focusing on language coherence, relevancy, and fluency. |
|
|
|
#### Metrics |
|
|
|
- **Loss:** Evaluated based on token-level prediction accuracy. |
|
- **Perplexity:** 2.506 |
|
|
|
### Results |
|
|
|
The model generates coherent and most of the time contextually appropriate outputs across multiple domains. |
|
|
|
## Risks and Limitations |
|
|
|
### Known Issues |
|
|
|
- The model may produce outputs reflecting biases present in the training data. |
|
|
|
### Recommendations |
|
|
|
Users should apply human review when using the model in critical or sensitive applications. |
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
import torch |
|
from transformers import pipeline, set_seed |
|
|
|
model_name = "amusktweewt/tiny-model-500M-chat-v2-5-exp" |
|
chatbot = pipeline( |
|
"text-generation", |
|
model=model_name, |
|
device=0 |
|
) |
|
|
|
set_seed(42) |
|
|
|
print("Chatbot is ready! Type 'exit' to end the conversation.") |
|
|
|
while True: |
|
user_input = input("You: ").strip() |
|
if user_input.lower() == "exit": |
|
print("Exiting chat. Goodbye!") |
|
break |
|
|
|
messages = [ |
|
{"role": "user", "content": user_input}, |
|
{"role": "assistant", "content": ""} |
|
] |
|
|
|
prompt = chatbot.tokenizer.apply_chat_template(messages, tokenize=False) |
|
|
|
# Generate text using the formatted prompt. |
|
response = chatbot( |
|
prompt, |
|
do_sample=True, |
|
max_new_tokens=512, |
|
top_k=50, |
|
temperature=0.1, |
|
num_return_sequences=1, |
|
repetition_penalty=1.1, |
|
pad_token_id=chatbot.tokenizer.eos_token_id, |
|
min_new_tokens=0 |
|
) |
|
|
|
full_text = response[0]["generated_text"] |
|
bot_response = full_text[len(prompt):].strip() |
|
print(f"Bot: {bot_response}") |
|
``` |
|
|
|
## Technical Specifications |
|
|
|
### Model Architecture and Objective |
|
|
|
The model follows a **Transformer-based architecture** optimized for causal language modeling tasks. |
|
|
|
- Attention heads: 16 |
|
- Hidden size: 1536 |
|
- Flash attention and memory-efficient attention enabled |
|
|
|
### Compute Infrastructure |
|
|
|
#### Hardware |
|
|
|
- Single GPU (NVIDIA 4090) |
|
|
|
#### Software |
|
|
|
- Python 3.8+ |
|
- HuggingFace Transformers 4.48.0 |
|
- PyTorch 2.4 |
|
|
|
## Environmental Impact |
|
|
|
- **Training Hours:** 216 hours |
|
- **Hardware:** NVIDIA 4090 |
|
- **Carbon Emitted:** 9.07 kg CO2 eq |
|
|
|
## Model Card Authors |
|
|
|
amusktweewt |
|
|
|
## Model Card Contact |
|
|
|
For questions or feedback, contact amusktweewt. |
|
|
|
|