NaniDAO/deepseek-r1-qwen-2.5-32B-ablated · What ablation library / tool did you use?

Nafnlaus

Jan 27

I was trying to do this with FailSpy's, but got stuck on issues with TransformerLens.

nerderlyne

Nani DAO org Jan 27

We used transformer lens as well! It supports qwen architecture.

Nafnlaus

Jan 27

We used transformer lens as well! It supports qwen architecture.

Huh! How did you get them to work? I couldn't do anything with the models it wrote it because all the tensor names were wrong. E.g. trying to convert to GGUF:

Using local model: DeepPooh-R1-Distill-Qwen-1.5B-TL/
Converting to base GGUF (f16)...
INFO:hf-to-gguf:Loading model: DeepPooh-R1-Distill-Qwen-1.5B-TL
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:gguf: loading model part 'model.safetensors'
Traceback (most recent call last):
File "/home/user/llama.cpp//convert_hf_to_gguf.py", line 5140, in
main()
File "/home/user/llama.cpp//convert_hf_to_gguf.py", line 5134, in main
model_instance.write()
File "/home/user/llama.cpp//convert_hf_to_gguf.py", line 439, in write
self.prepare_tensors()
File "/home/user/llama.cpp//convert_hf_to_gguf.py", line 298, in prepare_tensors
for new_name, data_torch in (self.modify_tensors(data_torch, name, bid)):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/llama.cpp//convert_hf_to_gguf.py", line 266, in modify_tensors
return [(self.map_tensor_name(name), data_torch)]
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/llama.cpp//convert_hf_to_gguf.py", line 214, in map_tensor_name
raise ValueError(f"Can not map tensor {name!r}")
ValueError: Can not map tensor 'blocks.0.attn.W_O'

I started working on manually remapping them one by one and it became increasingly clear that none of this was as expected.

Could you be bothered to share any scripts / processes you used? :)