Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

.gitattributes +4 -0
README.md +227 -0
imatrix.dat +3 -0
qwen3-8b-hippocratesv1.fp16.gguf +3 -0
qwen3-8b-hippocratesv1.q2_k-imat-00001-of-00002.gguf +3 -0
qwen3-8b-hippocratesv1.q2_k-imat-00002-of-00002.gguf +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,7 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+imatrix.dat filter=lfs diff=lfs merge=lfs -text
+qwen3-8b-hippocratesv1.fp16.gguf filter=lfs diff=lfs merge=lfs -text
+qwen3-8b-hippocratesv1.q2_k-imat-00001-of-00002.gguf filter=lfs diff=lfs merge=lfs -text
+qwen3-8b-hippocratesv1.q2_k-imat-00002-of-00002.gguf filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,227 @@

+---
+license: apache-2.0
+datasets:
+- FreedomIntelligence/medical-o1-reasoning-SFT
+- unsloth/OpenMathReasoning-mini
+- mlabonne/guanaco-llama2-1k
+- madrylab/gsm8k-platinum
+language:
+- en
+- es
+base_model:
+- Qwen/Qwen3-8B
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- medical
+- grpo
+- math
+- reasoning
+- fine-tuned
+- qlora
+- lora
+- multi-stage-finetuning
+- autoquant
+- gguf
+---
+# Qwen3-8B-MultiStage-Finetune-Hybrid
+## Model Description
+This is a **fine-tuned** version of the **Qwen/Qwen3-8B** large language model. It's specialized through a multi-stage training pipeline focusing on **medical reasoning**, **mathematical problem-solving**, and **general conversational abilities**. The model was trained using **QLoRA** (Quantized Low-Rank Adaptation) and **GRPO** (Generative Reinforcement Learning with Policy Optimization) for both efficiency and enhanced performance in its specialized domains.
+The training methodology uses a progressive approach, building capabilities in distinct areas before consolidating them:
+1.  **Medical Reasoning SFT:** Initial fine-tuning on a specialized medical dataset to adapt the model to medical explanations and reasoning.
+2.  **Mathematical SFT:** Further fine-tuning on a mathematical dataset to enhance its ability to solve math problems.
+3.  **Mathematical GRPO:** A reinforcement learning stage that leverages a reward function to optimize the model's accuracy and ability to provide structured mathematical solutions, particularly with answers in a `\boxed{}` format.
+4.  **General Chat SFT:** Final fine-tuning on a diverse chat dataset to improve conversational fluency, helpfulness, and alignment with common dialogue patterns.
+## Training Details
+### Training Data
+The model was trained on a carefully selected set of public datasets:
+* **Medical Reasoning:** `FreedomIntelligence/medical-o1-reasoning-SFT`
+* **Mathematical Reasoning:** `unsloth/OpenMathReasoning-mini`
+* **General Conversation:** `mlabonne/guanaco-llama2-1k`
+### Training Procedure
+The model was fine-tuned using a hybrid approach that combines the efficient training capabilities of the `unsloth` library with the advanced reinforcement learning features of `trl`:
+* **Base Model:** Qwen/Qwen3-8B
+* **Quantization:** 4-bit NormalFloat (NF4) with double quantization enabled (`bnb_4bit_use_double_quant=True`), allowing for efficient training on limited GPU memory.
+* **LoRA Configuration:** A rank of `r=24`, `lora_alpha=32`, and `lora_dropout=0.05` was applied. Key attention and feed-forward projection layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`) were targeted for adaptation.
+* **Gradient Checkpointing:** Enabled for memory efficiency, with `recompute_grad=True` for Unsloth-specific optimizations.
+* **Dynamic Hyperparameters:** Batch sizes and gradient accumulation steps were adaptively adjusted per training stage to optimize GPU memory utilization and training throughput.
+* **Learning Rate Schedule:** A cosine decay schedule with a warmup ratio was used. Learning rates were customized for each training stage.
+* **Optimizer:** `adamw_8bit`.
+* **Regularization:** Gradient norm clipping (`max_grad_norm=1.0`) and weight decay (`0.01`) were applied to prevent exploding gradients and overfitting.
+* **Early Stopping:** Applied during SFT stages with a patience of 2 on validation loss, stopping training if no significant improvement was observed.
+* **Hardware:** Training was performed on a single GPU.
+* **Software Stack:** Python, Hugging Face `transformers`, `unsloth`, `trl`, `datasets`, `wandb` (for experiment tracking), and `vllm` (used during the GRPO stage for efficient text generation).
+## Usage
+This model is designed for **text generation**, particularly in response to chat-based prompts or specific medical and mathematical queries. To get the best results, ensure your prompts are formatted correctly following the model's training structure.
+### Load the Model
+```python
+import torch
+from transformers import AutoTokenizer, BitsAndBytesConfig
+from unsloth import FastLanguageModel
+# Configuration parameters (matching training)
+MAX_SEQ_LENGTH = 2048
+LOAD_IN_4BIT = True
+USE_DOUBLE_QUANT = True
+# Initialize BitsAndBytesConfig as used during training
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=LOAD_IN_4BIT,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16,
+    bnb_4bit_use_double_quant=USE_DOUBLE_QUANT,
+)
+# Replace with the actual path to your uploaded model on Hugging Face Hub
+model_id = "your-huggingface-username/Qwen3-8B-MultiStage-Finetune-Hybrid"
+# Load tokenizer
+tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False)
+if tokenizer.pad_token is None:
+    tokenizer.pad_token = tokenizer.eos_token # Ensure pad token is set for generation
+# Load the model using Unsloth's optimized loading
+model = FastLanguageModel.from_pretrained(
+    model_id,
+    quantization_config=bnb_config,
+    max_seq_length=MAX_SEQ_LENGTH,
+    device_map="auto", # Automatically maps model to available GPUs
+)
+# Example for a general chat interaction
+messages = [
+    {"role": "system", "content": "You are a friendly and helpful assistant."},
+    {"role": "user", "content": "Tell me a short, funny story about a clumsy robot."},
+]
+# Apply the chat template and tokenize inputs
+input_ids = tokenizer.apply_chat_template(
+    messages,
+    tokenize=True,
+    add_generation_prompt=True, # Important: Add the prompt for the assistant's turn
+    return_tensors="pt"
+).to("cuda") # Move inputs to GPU
+# Generate outputs
+outputs = model.generate(
+    input_ids,
+    max_new_tokens=512, # Maximum tokens to generate
+    do_sample=True,     # Enable sampling for more diverse outputs
+    temperature=0.7,    # Control randomness
+    top_p=0.95          # Nucleus sampling
+)
+# Decode and print the generated text, skipping special tokens
+print("--- General Chat Example ---")
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+# Example for a math problem (model is trained to provide a \boxed{} answer)
+math_messages = [
+    {"role": "system", "content": "You are a math solver. Provide your reasoning within \\ and the final answer in \\boxed{} format."},
+    {"role": "user", "content": "If a car travels at 80 km/h for 2.5 hours, and then at 60 km/h for another 1.5 hours, what is the total distance traveled?"},
+]
+# Apply math chat template and tokenize inputs
+math_input_ids = tokenizer.apply_chat_template(
+    math_messages,
+    tokenize=True,
+    add_generation_prompt=True,
+    return_tensors="pt"
+).to("cuda")
+# Generate outputs for the math problem
+math_outputs = model.generate(
+    math_input_ids,
+    max_new_tokens=512,
+    do_sample=True,
+    temperature=0.6, # Slightly lower temperature for more deterministic math outputs
+    top_p=0.9
+)
+print("\n--- Math Example ---")
+print(tokenizer.decode(math_outputs[0], skip_special_tokens=True))
+Limitations and Bias
+As a large language model, this fine-tuned Qwen-8B model inherits general limitations and potential biases from its extensive pre-training and fine-tuning data:
+Hallucinations: The model may generate information that is factually incorrect or nonsensical. Always cross-reference critical information.
+Factual Accuracy: While specialized in medical and mathematical domains, it should not be used as a substitute for professional medical advice, complex mathematical proofs, or any domain requiring absolute precision without independent verification.
+Bias: The model's outputs are influenced by the biases present in its training data (both the base model's pre-training and the fine-tuning datasets). This may manifest in stereotypical, harmful, or unfair content.
+Language Proficiency: Primarily trained on English text. While some Spanish content was present in the general chat dataset, its proficiency in Spanish or other languages is not guaranteed and may vary.
+Context Window: Limited by its max_seq_length (2048 tokens). Very long inputs or extensive multi-turn conversations might lead to degraded performance or truncation of context.
+Ethical Considerations
+Users should be aware of the following ethical considerations when deploying or using this model:
+Not for Critical Applications: This model is intended for research, experimentation, and exploratory applications. It is not designed or validated for use in critical systems where accuracy, reliability, and safety are paramount (e.g., medical diagnosis, financial advice, legal counsel, or decision-making systems impacting individuals).
+Responsible AI Use: Deploy and use this model responsibly, adhering to ethical AI guidelines and principles. Implement safeguards to monitor its outputs and prevent potential misuse, discrimination, or the generation of harmful content.
+Data Privacy and Security: Do not use this model with sensitive personal identifiable information (PII) or confidential data. Ensure compliance with all relevant data privacy regulations.
+Transparency: Be transparent with end-users when they are interacting with an AI system.
+Citation
+If you use this model or the training methodology, please consider citing the following key components:
+Code snippet
+@misc{qwen3,
+  author = {Qwen Team},
+  title = {Qwen3-8B},
+  year = {2024},
+  publisher = {Hugging Face},
+  howpublished = {\url{[https://huggingface.co/Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B)}}
+}
+@misc{unsloth,
+  author = {Daniel Han},
+  title = {Unsloth: Fast LLM Fine-tuning},
+  year = {2023},
+  publisher = {GitHub},
+  howpublished = {\url{[https://github.com/unsloth/unsloth](https://github.com/unsloth/unsloth)}}
+}
+@misc{trl,
+  author = {Hugging Face Team},
+  title = {TRL: Transformer Reinforcement Learning},
+  year = {2023},
+  publisher = {GitHub},
+  howpublished = {\url{[https://github.com/huggingface/trl](https://github.com/huggingface/trl)}}
+}
+@misc{medical_dataset,
+  author = {FreedomIntelligence},
+  title = {medical-o1-reasoning-SFT},
+  year = {2024},
+  publisher = {Hugging Face},
+  howpublished = {\url{[https://huggingface.co/FreedomIntelligence/medical-o1-reasoning-SFT](https://huggingface.co/FreedomIntelligence/medical-o1-reasoning-SFT)}}
+}
+@misc{openmathreasoning_mini,
+  author = {unsloth},
+  title = {OpenMathReasoning-mini},
+  year = {2023},
+  publisher = {Hugging Face},
+  howpublished = {\url{[https://huggingface.co/datasets/unsloth/OpenMathReasoning-mini](https://huggingface.co/datasets/unsloth/OpenMathReasoning-mini)}}
+}
+@misc{guanaco_llama2_1k,
+  author = {mlabonne},
+  title = {guanaco-llama2-1k},
+  year = {2023},
+  publisher = {Hugging Face},
+  howpublished = {\url{[https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k](https://huggingface.co/datasets/mlabonne/guanaco-llama2-1k)}}
+}

imatrix.dat ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:071db6d72fa37286b3598ff120179f4848fa87cb347adfc861ccafaa061495ac
+size 5316789

qwen3-8b-hippocratesv1.fp16.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3b55021a543dd6d4d37eaa76e5a7da5cc0ca945c9138f46ecca5320cdf5d9dd5
+size 16388045056

qwen3-8b-hippocratesv1.q2_k-imat-00001-of-00002.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5efd453b61a2e7ae17064405c3649b26ad9950f0332f6f69cf7c9fb6ee86c1e9
+size 2356901152

qwen3-8b-hippocratesv1.q2_k-imat-00002-of-00002.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:248147f1df17c58809abc859660f75566b2ef81245a092f7039db2fd7af723d3
+size 924833184