--- library_name: transformers tags: - mental-health - counseling - Mistral-7B - LoRA - quantization - PEFT --- # Model Card for `mistral-7b-therapist-v1` ## Model Details ### Model Description This is a specialized adaptation of Mistral-7B-Instruct fine-tuned for mental health counseling using parameter-efficient techniques such as 4-bit quantization (NF4) and LoRA. The model leverages the `bitsandbytes` library for quantization and the `trl` library for reinforcement learning-ready training. It has been trained on the `mental_health_counseling_conversations` dataset, formatted into chat templates with a 256-token context window. Key features: - **Quantization**: 4-bit quantization (NF4) reduces memory usage while maintaining performance. - **LoRA**: Low-Rank Adaptation (r=16, alpha=32) targets attention layers for efficient fine-tuning. - **Training**: Trained for 3 epochs with an effective batch size of 4 via gradient accumulation. - **Metrics**: Final train loss of 0.221 and eval loss of 1.297. ### Model Sources - **Repository**: [GitHub Repository](https://github.com/ragi19/mistral-7b-therapist-v1/import) --- ## Uses ### Direct Use The model is designed for direct use in mental health counseling applications. It can generate empathetic, context-aware responses to user inputs, making it suitable for conversational AI systems aimed at providing psychological support. ### Downstream Use This model can serve as a foundation for downstream tasks such as: - Building personalized therapy chatbots. - Enhancing existing healthcare platforms with AI-driven counseling capabilities. - Supporting research in natural language processing for mental health. ### Out-of-Scope Use The model is not intended for: - Medical diagnosis or treatment without human oversight. - High-stakes decision-making without additional validation. - Applications requiring toxicity-free responses without safety measures. --- ## Bias, Risks, and Limitations ### Known Biases - The model may inadvertently perpetuate biases present in the training data, particularly related to cultural or linguistic nuances. - Responses might lack sensitivity in certain edge cases due to limited contextual understanding. ### Risks - Misinterpretation of user input leading to inappropriate or harmful responses. - Overreliance on the model without human intervention could lead to incorrect advice. ### Limitations - Limited to a 256-token context window; longer conversations may lose coherence. - Performance degradation on out-of-distribution inputs. - Requires additional safety mechanisms (e.g., toxicity detection) for deployment. ### Recommendations - Always validate outputs before deploying in production environments. - Incorporate disclaimers about AI limitations and encourage users to seek professional help when necessary. - Regularly update the model with new data to improve robustness and reduce bias. --- ## How to Get Started with the Model Use the following code snippet to load and test the model: ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained("ragishehab/mistral-7b-therapist-v1") model = AutoModelForCausalLM.from_pretrained("ragishehab/mistral-7b-therapist-v1", device_map="auto") # Generate response input_text = "I'm feeling overwhelmed lately." inputs = tokenizer(input_text, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=100) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` --- ## Training Details ### Training Data The model was trained on the `mental_health_counseling_conversations` dataset, which contains realistic dialogues between counselors and clients. Each conversation is formatted as a sequence of user and assistant turns, ensuring alignment with conversational AI requirements. ### Training Procedure #### Preprocessing - Conversations were tokenized and formatted into JSON structures: ```json [{"role": "user", "content": "context"}, {"role": "assistant", "content": "response"}] ``` - Context windows were truncated to 256 tokens with right-padding. #### Training Hyperparameters - **Learning Rate**: 2e-5 - **Batch Size**: 1 (effective batch size 4 via gradient accumulation) - **Epochs**: 3 - **Optimizer**: AdamW - **Scheduler**: Linear decay - **Precision**: FP16 mixed precision #### Speeds, Sizes, Times - Throughput: 0.139 steps/sec - Total Steps: 1350 - GPU Memory Usage: ~3.5GB - Training Time: ~12 hours on a single A100 GPU --- ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data The evaluation dataset consists of unseen mental health counseling conversations, ensuring diverse scenarios and edge cases. #### Factors - Response relevance - Empathy and tone appropriateness - Coherence across multi-turn interactions #### Metrics - Loss: Evaluated using cross-entropy loss. - Human Evaluation: Planned for future iterations to assess qualitative aspects like empathy and accuracy. ### Results - Train Loss: 0.221 - Eval Loss: 1.297 #### Summary The model demonstrates strong convergence patterns and generates coherent, context-aware responses suitable for mental health counseling. --- ## Model Examination [optional] The model employs LoRA and quantization techniques to minimize resource consumption while maintaining performance. Future work includes interpretability studies to better understand its decision-making process. --- ## Technical Specifications ### Model Architecture and Objective The model is based on the Mistral-7B-Instruct architecture, adapted for mental health counseling through LoRA and quantization. Its primary objective is to generate empathetic, contextually relevant responses. ### Compute Infrastructure #### Hardware - GPU: NVIDIA A100 - CPU: Intel Xeon Scalable Processor #### Software - Python: 3.9+ - PyTorch: 2.0+ - Transformers: 4.30+ - BitsAndBytes: Latest stable version --- ## Model Card Authors [optional] - Ragishehab --- ## Model Card Contact For inquiries or feedback, please contact ragishehab1@gmail.com.