"ffn_mult": null,
I saw some "ffn_mult": null, in the config, is that intended or means 1.0?
+1 on this, llamacpp needs to know the ffn_mult for those items..
llamacpp needs to know the ffn_mult for those items..
There isn't one though. If I understand it correctly, forward-feed is disabled ( see "no_op":true,
), so these blocks are effectively skipped?
,
{
"attention": {
"n_heads_in_group": null,
"no_op": true, π
"num_sink_tokens": null,
"replace_with_linear": false,
"sparsify": null,
"unshifted_sink": false,
"use_prefill_window_in_sink_attention": false,
"window_length": null
},
"ffn": {
"ffn_mult": null, π
"no_op": true,
"replace_with_linear": false,
"sparsify": null
}
}
You may be right, may be that we need an update in llama.cpp then π€
I've tried to default it to "ffn_mult": 1.0 last night. llama.cpp technically produced the files but models cannot be loaded:
"missing tensor: 'blk.9.ffn_norm.weight'" so the no_op theory makes sense!?
Damn, I'm doing the same (files are being produced)
Haven't had a chance to load it yet but no doubt I'll encounter the same thing :(
There are some attentions shere "no_op": true
but "ffn_mult": 1.95,
is set:
{
"attention": {
"n_heads_in_group": null,
"no_op": true,
"num_sink_tokens": null,
"replace_with_linear": false,
"sparsify": null,
"unshifted_sink": false,
"use_prefill_window_in_sink_attention": false,
"window_length": null
},
"ffn": {
"ffn_mult": 1.95,
"no_op": false,
"replace_with_linear": false,
"sparsify": null
}
},
@gghfez you should try it too if you already have the files.
The "ffn_mult": null
setting can be used with either no_op: true
or replace_with_linear: true
. In this model, it is always used alongside no_op: true
, which means the feed-forward (ffn/mlp) component of the transformer block is completely skipped, including the preceding layer normalization. This applies regardless of the attention mechanism used earlier in the block.
I want to understand more and I study againπ π