Update README.md
Browse files
README.md
CHANGED
@@ -17,21 +17,20 @@ metrics:
|
|
17 |
- loss
|
18 |
---
|
19 |
|
20 |
-
# Better SQL Agent - Llama 3.1 8B
|
21 |
|
22 |
-
##
|
23 |
- **Training Samples**: 19,480 (SQL analytics + technical conversations)
|
24 |
- **Hardware**: NVIDIA 4x A10G GPU (96GB VRAM)
|
25 |
|
26 |
-
##
|
27 |
This is a high-performance fine-tuned version of **Meta-Llama-3.1-8B-Instruct**, specifically optimized for:
|
28 |
-
-
|
29 |
-
-
|
30 |
-
-
|
31 |
-
-
|
32 |
|
33 |
-
|
34 |
-
## π§ Training Configuration
|
35 |
- **Base Model**: `meta-llama/Llama-3.1-8B-Instruct`
|
36 |
- **Training Method**: LoRA (Low-Rank Adaptation)
|
37 |
- Rank: 16, Alpha: 32, Dropout: 0.05
|
@@ -39,7 +38,7 @@ This is a high-performance fine-tuned version of **Meta-Llama-3.1-8B-Instruct**,
|
|
39 |
- **Context Length**: 128K tokens (extended from base)
|
40 |
- **Optimizer**: AdamW with cosine scheduling
|
41 |
|
42 |
-
##
|
43 |
```python
|
44 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
45 |
import torch
|
@@ -78,7 +77,7 @@ response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:],
|
|
78 |
print(response)
|
79 |
```
|
80 |
|
81 |
-
##
|
82 |
| Metric | Value |
|
83 |
|--------|-------|
|
84 |
| **Starting Loss** | 1.53 |
|
@@ -86,35 +85,34 @@ print(response)
|
|
86 |
| **Loss Reduction** | **96.7%** |
|
87 |
| **Training Time** | 8.9 hours |
|
88 |
|
89 |
-
##
|
90 |
- **SQL Generation**: Create complex queries from natural language
|
91 |
- **Data Analysis**: Generate insights and analytical queries
|
92 |
- **Code Assistance**: Debug and optimize SQL code
|
93 |
- **Technical Support**: Answer database and analytics questions
|
94 |
- **Learning Aid**: Explain SQL concepts and best practices
|
95 |
|
96 |
-
##
|
97 |
The model was trained on a curated dataset of **19,480 high-quality examples** including:
|
98 |
- SQL query generation tasks
|
99 |
- Data analysis conversations
|
100 |
- Technical problem-solving dialogues
|
101 |
- Tool usage patterns and workflows
|
102 |
|
103 |
-
##
|
104 |
-
- **Unsloth Integration**: 2x faster training and inference
|
105 |
- **4-bit Quantization**: Reduced memory footprint
|
106 |
- **Flash Attention**: Optimized attention mechanism
|
107 |
- **Mixed Precision**: BF16 training for efficiency
|
108 |
|
109 |
-
##
|
110 |
This model inherits the **Llama 3.1 license** from the base model. Please review the [official license](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) for usage terms.
|
111 |
|
112 |
-
##
|
113 |
- Based on Meta's Llama 3.1 8B Instruct model
|
114 |
|
115 |
-
##
|
116 |
For questions about this model, please open an issue in the repository or contact the model author.
|
117 |
|
118 |
---
|
119 |
|
120 |
-
|
|
|
17 |
- loss
|
18 |
---
|
19 |
|
20 |
+
# Better SQL Agent - Llama 3.1 8B
|
21 |
|
22 |
+
## Training Results
|
23 |
- **Training Samples**: 19,480 (SQL analytics + technical conversations)
|
24 |
- **Hardware**: NVIDIA 4x A10G GPU (96GB VRAM)
|
25 |
|
26 |
+
## Model Description
|
27 |
This is a high-performance fine-tuned version of **Meta-Llama-3.1-8B-Instruct**, specifically optimized for:
|
28 |
+
- **SQL query generation and optimization**
|
29 |
+
- **Data analysis and insights**
|
30 |
+
- **Technical assistance and debugging**
|
31 |
+
- **Tool-based workflows**
|
32 |
|
33 |
+
## Training Configuration
|
|
|
34 |
- **Base Model**: `meta-llama/Llama-3.1-8B-Instruct`
|
35 |
- **Training Method**: LoRA (Low-Rank Adaptation)
|
36 |
- Rank: 16, Alpha: 32, Dropout: 0.05
|
|
|
38 |
- **Context Length**: 128K tokens (extended from base)
|
39 |
- **Optimizer**: AdamW with cosine scheduling
|
40 |
|
41 |
+
## Quick Start
|
42 |
```python
|
43 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
44 |
import torch
|
|
|
77 |
print(response)
|
78 |
```
|
79 |
|
80 |
+
## Performance Metrics
|
81 |
| Metric | Value |
|
82 |
|--------|-------|
|
83 |
| **Starting Loss** | 1.53 |
|
|
|
85 |
| **Loss Reduction** | **96.7%** |
|
86 |
| **Training Time** | 8.9 hours |
|
87 |
|
88 |
+
## Use Cases
|
89 |
- **SQL Generation**: Create complex queries from natural language
|
90 |
- **Data Analysis**: Generate insights and analytical queries
|
91 |
- **Code Assistance**: Debug and optimize SQL code
|
92 |
- **Technical Support**: Answer database and analytics questions
|
93 |
- **Learning Aid**: Explain SQL concepts and best practices
|
94 |
|
95 |
+
## Training Data
|
96 |
The model was trained on a curated dataset of **19,480 high-quality examples** including:
|
97 |
- SQL query generation tasks
|
98 |
- Data analysis conversations
|
99 |
- Technical problem-solving dialogues
|
100 |
- Tool usage patterns and workflows
|
101 |
|
102 |
+
## Optimization Features
|
|
|
103 |
- **4-bit Quantization**: Reduced memory footprint
|
104 |
- **Flash Attention**: Optimized attention mechanism
|
105 |
- **Mixed Precision**: BF16 training for efficiency
|
106 |
|
107 |
+
## License
|
108 |
This model inherits the **Llama 3.1 license** from the base model. Please review the [official license](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE) for usage terms.
|
109 |
|
110 |
+
## Acknowledgments
|
111 |
- Based on Meta's Llama 3.1 8B Instruct model
|
112 |
|
113 |
+
## Model Card Contact
|
114 |
For questions about this model, please open an issue in the repository or contact the model author.
|
115 |
|
116 |
---
|
117 |
|
118 |
+
**Achieved 96.7% loss reduction - A testament to high-quality training data and optimization!**
|