File size: 4,428 Bytes
46c144d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
---

base_model: meta-llama/Llama-3.2-1B
library_name: peft
tags:
- code
- llm
- Evolution_Learning_Network
- qlora
- llama
---


# Evolution Learning Network (ELN) with QLoRA and Genetic Algorithms For LLM

## Overview

This project implements an **Evolution Learning Network (ELN)** to fine-tune transformer-based models like LLaMA using a combination of **Quantized Low-Rank Adaptation (QLoRA)** and **Genetic Algorithms (GA)**. The primary objective is to evolve a population of models across multiple generations to optimize for performance (fitness) and specialization, while maintaining diversity.

### Key Features
- Efficient model fine-tuning using **QLoRA**.
- Evolutionary strategies, including **random mutations** and fitness-based selection.
- Hardware-efficient training with **4-bit quantization**.
- Comprehensive experiment tracking with **WandB**.
- Diversity maintenance through **LoRA weight fingerprinting**.

---

## Model Details

### Base Model
- **Name**: [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) (can be replaced with any Hugging Face model).
- **Architecture**: Transformer-based causal language model.

### Quantization Configuration
- **Quantization Type**: 4-bit using `bitsandbytes` (`bnb_4bit`).
- **Parameters**:
  - Compute Type: `torch.float16`
  - Quantization Type: `"nf4"` (Nonlinear quantization).
  - Double Quantization: Enabled.
  - Nested Quantization: Enabled.

### LoRA (Low-Rank Adaptation)
- **Dimensions (r)**: 8
- **Alpha (Scaling)**: 16
- **Target Modules**: Query and Value projections (`q_proj`, `v_proj`).
- **Dropout**: 0.05
- **Task Type**: Causal Language Modeling (`CAUSAL_LM`).

### Training Strategy
- **Optimizer**: `paged_adamw_8bit` for memory-efficient updates.
- **Precision**: Mixed precision (`fp16`) for faster training.

---

## Hyperparameters

### General Parameters
- **Generations**: 10
- **Population Size**: 4
- **Dataset Size**: 2000 samples per split (adjustable for larger datasets).

### Training
- **Batch Size**: 8
- **Gradient Accumulation**: 16 steps.
- **Learning Rate**: `2e-4`
- **Epochs per Model**: 2

### Mutations
- **Mutation Rate**: 10% (probability per parameter).
- **Mutation Scale**: Noise added with a standard deviation of 0.02.

---

## Dataset Details

### Source
- **Name**: WikiText ([wikitext-2-raw-v1](https://huggingface.co/datasets/Salesforce/wikitext/viewer/wikitext-2-raw-v1) for larger datasets).
- **Splits**:
  - `train` → Model training.
  - `validation` → General task evaluation.
  - `test` → Specific task evaluation.

### Tokenization
- **Tokenizer**: Hugging Face `AutoTokenizer`.
- **Max Token Length**: 128 tokens.
- **Padding**: Fixed to `"max_length"`.

---

## Results

### Summary
- **Total Generations**: 10
- **Best Fitness Achieved**: 0.4772
- **Final Population Diversity**: 0.0011

### Evolution History (Highlights)
| Generation | Best Fitness | Avg Fitness | Diversity | Best Specialization |
|------------|--------------|-------------|-----------|---------------------|
| 1          | 0.4096       | 0.4023      | 0.00097   | 0.9967              |
| 5          | 0.4727       | 0.4722      | 0.00099   | 0.9968              |
| 10         | 0.4772       | 0.4768      | 0.00106   | 0.9972              |

---

## Hardware & Framework

### Hardware
- Multi-GPU support with `torch.nn.parallel.DistributedDataParallel` or `Accelerator`.
- Logs GPU/CPU usage with `psutil` and `torch.cuda`.

### Frameworks & Libraries
- **Transformers**: Hugging Face model and tokenizer handling.
- **Datasets**: Data loading and processing.
- **WandB**: Experiment tracking and visualization.
- **BitsAndBytes**: 4-bit quantization.
- **PEFT**: LoRA-based fine-tuning.

---

## Future Work
- Explore larger population sizes and more generations for enhanced diversity.
- Experiment with other datasets to generalize findings.
- Integrate additional mutation strategies for broader exploration.

---

## Citation
Remaining

---
> Code to run locally

```python

from peft import PeftModel

from transformers import AutoModelForCausalLM



base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B")

model = PeftModel.from_pretrained(base_model, "diabolic6045/ELN-llama-1B-adapter")

```
### Framework versions

- PEFT 0.14.0