Mermaid-Llama-3-7B-Pruned / mergekit_config.yml
TroyDoesAI's picture
Found 4 Bunk Layers, Removed 3 without training yields a perfectly send-able Model with less requirements and more speed.
4c2ac8f verified
# slices:
# - sources:
# - model: TroyDoesAI/Mermaid-Llama-3-8B
# layer_range: [0, 28]
######## [0, 28] Is good because layer 28 does nothing when removed
# - sources:
# - model: TroyDoesAI/Mermaid-Llama-3-8B
# layer_range: [29, 32]
# slices:
# - sources:
# - model: TroyDoesAI/Mermaid-Llama-3-8B
# layer_range: [0, 27]
# ####### [0, 27] Is good because layer 27 does nothing when removed
# - sources:
# - model: TroyDoesAI/Mermaid-Llama-3-8B
# layer_range: [28, 32]
# slices:
# - sources:
# - model: TroyDoesAI/Mermaid-Llama-3-8B
# layer_range: [0, 26]
# ####### [0, 26] Is good because layer 26 does nothing when removed
# - sources:
# - model: TroyDoesAI/Mermaid-Llama-3-8B
# layer_range: [27, 32]
# slices:
# - sources:
# - model: TroyDoesAI/Mermaid-Llama-3-8B
# layer_range: [0, 25]
# ####### [0, 25] Is good because layer 25 does nothing when removed
# - sources:
# - model: TroyDoesAI/Mermaid-Llama-3-8B
# layer_range: [26, 32]
slices:
- sources:
- model: TroyDoesAI/Mermaid-Llama-3-8B
layer_range: [0, 25]
- sources:
- model: TroyDoesAI/Mermaid-Llama-3-8B
layer_range: [26, 27]
- sources:
- model: TroyDoesAI/Mermaid-Llama-3-8B
layer_range: [29, 32]
# layer 31 / 32 is syntax layer around middle of output extra > for edges
# Layer 29 is syntax layer around middle of input for semi colon and colon mixup
# TODO: Layer 28 Does NOTHING
# Layer 27 Does NOTHING
# Layer 26 Does NOTHING
# Layer 25 Does NOTHING
merge_method: passthrough
dtype: float16