Update README.md
Browse files
README.md
CHANGED
|
@@ -17,21 +17,20 @@ metrics:
|
|
| 17 |
- loss
|
| 18 |
---
|
| 19 |
|
| 20 |
-
# Better SQL Agent - Llama 3.1 8B
|
| 21 |
|
| 22 |
-
##
|
| 23 |
- **Training Samples**: 19,480 (SQL analytics + technical conversations)
|
| 24 |
- **Hardware**: NVIDIA 4x A10G GPU (96GB VRAM)
|
| 25 |
|
| 26 |
-
##
|
| 27 |
This is a high-performance fine-tuned version of **Meta-Llama-3.1-8B-Instruct**, specifically optimized for:
|
| 28 |
-
-
|
| 29 |
-
-
|
| 30 |
-
-
|
| 31 |
-
-
|
| 32 |
|
| 33 |
-
|
| 34 |
-
## π§ Training Configuration
|
| 35 |
- **Base Model**: `meta-llama/Llama-3.1-8B-Instruct`
|
| 36 |
- **Training Method**: LoRA (Low-Rank Adaptation)
|
| 37 |
- Rank: 16, Alpha: 32, Dropout: 0.05
|
|
@@ -39,7 +38,7 @@ This is a high-performance fine-tuned version of **Meta-Llama-3.1-8B-Instruct**,
|
|
| 39 |
- **Context Length**: 128K tokens (extended from base)
|
| 40 |
- **Optimizer**: AdamW with cosine scheduling
|
| 41 |
|
| 42 |
-
##
|
| 43 |
```python
|
| 44 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 45 |
import torch
|
|
@@ -78,7 +77,7 @@ response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:],
|
|
| 78 |
print(response)
|
| 79 |
```
|
| 80 |
|
| 81 |
-
##
|
| 82 |
| Metric | Value |
|
| 83 |
|--------|-------|
|
| 84 |
| **Starting Loss** | 1.53 |
|
|
@@ -86,35 +85,34 @@ print(response)
|
|
| 86 |
| **Loss Reduction** | **96.7%** |
|
| 87 |
| **Training Time** | 8.9 hours |
|
| 88 |
|
| 89 |
-
##
|
| 90 |
- **SQL Generation**: Create complex queries from natural language
|
| 91 |
- **Data Analysis**: Generate insights and analytical queries
|
| 92 |
- **Code Assistance**: Debug and optimize SQL code
|
| 93 |
- **Technical Support**: Answer database and analytics questions
|
| 94 |
- **Learning Aid**: Explain SQL concepts and best practices
|
| 95 |
|
| 96 |
-
##
|
| 97 |
The model was trained on a curated dataset of **19,480 high-quality examples** including:
|
| 98 |
- SQL query generation tasks
|
| 99 |
- Data analysis conversations
|
| 100 |
- Technical problem-solving dialogues
|
| 101 |
- Tool usage patterns and workflows
|
| 102 |
|
| 103 |
-
##
|
| 104 |
-
- **Unsloth Integration**: 2x faster training and inference
|
| 105 |
- **4-bit Quantization**: Reduced memory footprint
|
| 106 |
- **Flash Attention**: Optimized attention mechanism
|
| 107 |
- **Mixed Precision**: BF16 training for efficiency
|
| 108 |
|
| 109 |
-
##
|
| 110 |
This model inherits the **Llama 3.1 license** from the base model. Please review the [official license](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) for usage terms.
|
| 111 |
|
| 112 |
-
##
|
| 113 |
- Based on Meta's Llama 3.1 8B Instruct model
|
| 114 |
|
| 115 |
-
##
|
| 116 |
For questions about this model, please open an issue in the repository or contact the model author.
|
| 117 |
|
| 118 |
---
|
| 119 |
|
| 120 |
-
|
|
|
|
| 17 |
- loss
|
| 18 |
---
|
| 19 |
|
| 20 |
+
# Better SQL Agent - Llama 3.1 8B
|
| 21 |
|
| 22 |
+
## Training Results
|
| 23 |
- **Training Samples**: 19,480 (SQL analytics + technical conversations)
|
| 24 |
- **Hardware**: NVIDIA 4x A10G GPU (96GB VRAM)
|
| 25 |
|
| 26 |
+
## Model Description
|
| 27 |
This is a high-performance fine-tuned version of **Meta-Llama-3.1-8B-Instruct**, specifically optimized for:
|
| 28 |
+
- **SQL query generation and optimization**
|
| 29 |
+
- **Data analysis and insights**
|
| 30 |
+
- **Technical assistance and debugging**
|
| 31 |
+
- **Tool-based workflows**
|
| 32 |
|
| 33 |
+
## Training Configuration
|
|
|
|
| 34 |
- **Base Model**: `meta-llama/Llama-3.1-8B-Instruct`
|
| 35 |
- **Training Method**: LoRA (Low-Rank Adaptation)
|
| 36 |
- Rank: 16, Alpha: 32, Dropout: 0.05
|
|
|
|
| 38 |
- **Context Length**: 128K tokens (extended from base)
|
| 39 |
- **Optimizer**: AdamW with cosine scheduling
|
| 40 |
|
| 41 |
+
## Quick Start
|
| 42 |
```python
|
| 43 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 44 |
import torch
|
|
|
|
| 77 |
print(response)
|
| 78 |
```
|
| 79 |
|
| 80 |
+
## Performance Metrics
|
| 81 |
| Metric | Value |
|
| 82 |
|--------|-------|
|
| 83 |
| **Starting Loss** | 1.53 |
|
|
|
|
| 85 |
| **Loss Reduction** | **96.7%** |
|
| 86 |
| **Training Time** | 8.9 hours |
|
| 87 |
|
| 88 |
+
## Use Cases
|
| 89 |
- **SQL Generation**: Create complex queries from natural language
|
| 90 |
- **Data Analysis**: Generate insights and analytical queries
|
| 91 |
- **Code Assistance**: Debug and optimize SQL code
|
| 92 |
- **Technical Support**: Answer database and analytics questions
|
| 93 |
- **Learning Aid**: Explain SQL concepts and best practices
|
| 94 |
|
| 95 |
+
## Training Data
|
| 96 |
The model was trained on a curated dataset of **19,480 high-quality examples** including:
|
| 97 |
- SQL query generation tasks
|
| 98 |
- Data analysis conversations
|
| 99 |
- Technical problem-solving dialogues
|
| 100 |
- Tool usage patterns and workflows
|
| 101 |
|
| 102 |
+
## Optimization Features
|
|
|
|
| 103 |
- **4-bit Quantization**: Reduced memory footprint
|
| 104 |
- **Flash Attention**: Optimized attention mechanism
|
| 105 |
- **Mixed Precision**: BF16 training for efficiency
|
| 106 |
|
| 107 |
+
## License
|
| 108 |
This model inherits the **Llama 3.1 license** from the base model. Please review the [official license](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) for usage terms.
|
| 109 |
|
| 110 |
+
## Acknowledgments
|
| 111 |
- Based on Meta's Llama 3.1 8B Instruct model
|
| 112 |
|
| 113 |
+
## Model Card Contact
|
| 114 |
For questions about this model, please open an issue in the repository or contact the model author.
|
| 115 |
|
| 116 |
---
|
| 117 |
|
| 118 |
+
**Achieved 96.7% loss reduction - A testament to high-quality training data and optimization!**
|