File size: 4,428 Bytes
46c144d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
---
base_model: meta-llama/Llama-3.2-1B
library_name: peft
tags:
- code
- llm
- Evolution_Learning_Network
- qlora
- llama
---
# Evolution Learning Network (ELN) with QLoRA and Genetic Algorithms For LLM
## Overview
This project implements an **Evolution Learning Network (ELN)** to fine-tune transformer-based models like LLaMA using a combination of **Quantized Low-Rank Adaptation (QLoRA)** and **Genetic Algorithms (GA)**. The primary objective is to evolve a population of models across multiple generations to optimize for performance (fitness) and specialization, while maintaining diversity.
### Key Features
- Efficient model fine-tuning using **QLoRA**.
- Evolutionary strategies, including **random mutations** and fitness-based selection.
- Hardware-efficient training with **4-bit quantization**.
- Comprehensive experiment tracking with **WandB**.
- Diversity maintenance through **LoRA weight fingerprinting**.
---
## Model Details
### Base Model
- **Name**: [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) (can be replaced with any Hugging Face model).
- **Architecture**: Transformer-based causal language model.
### Quantization Configuration
- **Quantization Type**: 4-bit using `bitsandbytes` (`bnb_4bit`).
- **Parameters**:
- Compute Type: `torch.float16`
- Quantization Type: `"nf4"` (Nonlinear quantization).
- Double Quantization: Enabled.
- Nested Quantization: Enabled.
### LoRA (Low-Rank Adaptation)
- **Dimensions (r)**: 8
- **Alpha (Scaling)**: 16
- **Target Modules**: Query and Value projections (`q_proj`, `v_proj`).
- **Dropout**: 0.05
- **Task Type**: Causal Language Modeling (`CAUSAL_LM`).
### Training Strategy
- **Optimizer**: `paged_adamw_8bit` for memory-efficient updates.
- **Precision**: Mixed precision (`fp16`) for faster training.
---
## Hyperparameters
### General Parameters
- **Generations**: 10
- **Population Size**: 4
- **Dataset Size**: 2000 samples per split (adjustable for larger datasets).
### Training
- **Batch Size**: 8
- **Gradient Accumulation**: 16 steps.
- **Learning Rate**: `2e-4`
- **Epochs per Model**: 2
### Mutations
- **Mutation Rate**: 10% (probability per parameter).
- **Mutation Scale**: Noise added with a standard deviation of 0.02.
---
## Dataset Details
### Source
- **Name**: WikiText ([wikitext-2-raw-v1](https://huggingface.co/datasets/Salesforce/wikitext/viewer/wikitext-2-raw-v1) for larger datasets).
- **Splits**:
- `train` → Model training.
- `validation` → General task evaluation.
- `test` → Specific task evaluation.
### Tokenization
- **Tokenizer**: Hugging Face `AutoTokenizer`.
- **Max Token Length**: 128 tokens.
- **Padding**: Fixed to `"max_length"`.
---
## Results
### Summary
- **Total Generations**: 10
- **Best Fitness Achieved**: 0.4772
- **Final Population Diversity**: 0.0011
### Evolution History (Highlights)
| Generation | Best Fitness | Avg Fitness | Diversity | Best Specialization |
|------------|--------------|-------------|-----------|---------------------|
| 1 | 0.4096 | 0.4023 | 0.00097 | 0.9967 |
| 5 | 0.4727 | 0.4722 | 0.00099 | 0.9968 |
| 10 | 0.4772 | 0.4768 | 0.00106 | 0.9972 |
---
## Hardware & Framework
### Hardware
- Multi-GPU support with `torch.nn.parallel.DistributedDataParallel` or `Accelerator`.
- Logs GPU/CPU usage with `psutil` and `torch.cuda`.
### Frameworks & Libraries
- **Transformers**: Hugging Face model and tokenizer handling.
- **Datasets**: Data loading and processing.
- **WandB**: Experiment tracking and visualization.
- **BitsAndBytes**: 4-bit quantization.
- **PEFT**: LoRA-based fine-tuning.
---
## Future Work
- Explore larger population sizes and more generations for enhanced diversity.
- Experiment with other datasets to generalize findings.
- Integrate additional mutation strategies for broader exploration.
---
## Citation
Remaining
---
> Code to run locally
```python
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")
model = PeftModel.from_pretrained(base_model, "diabolic6045/ELN-llama-1B-adapter")
```
### Framework versions
- PEFT 0.14.0 |