"ffn_mult": null,

#1
by csabakecskemeti - opened

I saw some "ffn_mult": null, in the config, is that intended or means 1.0?

+1 on this, llamacpp needs to know the ffn_mult for those items..

@Chris-Alexiuk @jiaqiz

llamacpp needs to know the ffn_mult for those items..

There isn't one though. If I understand it correctly, forward-feed is disabled ( see "no_op":true,), so these blocks are effectively skipped?

,
    {
      "attention": {
        "n_heads_in_group": null,
        "no_op": true, πŸ‘ˆ
        "num_sink_tokens": null,
        "replace_with_linear": false,
        "sparsify": null,
        "unshifted_sink": false,
        "use_prefill_window_in_sink_attention": false,
        "window_length": null
      },
      "ffn": {
        "ffn_mult": null, πŸ‘ˆ
        "no_op": true,
        "replace_with_linear": false,
        "sparsify": null
      }
    }

You may be right, may be that we need an update in llama.cpp then πŸ€”

I've tried to default it to "ffn_mult": 1.0 last night. llama.cpp technically produced the files but models cannot be loaded:
"missing tensor: 'blk.9.ffn_norm.weight'" so the no_op theory makes sense!?

Damn, I'm doing the same (files are being produced)
Haven't had a chance to load it yet but no doubt I'll encounter the same thing :(

There are some attentions shere "no_op": true but "ffn_mult": 1.95, is set:

{
      "attention": {
        "n_heads_in_group": null,
        "no_op": true,
        "num_sink_tokens": null,
        "replace_with_linear": false,
        "sparsify": null,
        "unshifted_sink": false,
        "use_prefill_window_in_sink_attention": false,
        "window_length": null
      },
      "ffn": {
        "ffn_mult": 1.95,
        "no_op": false,
        "replace_with_linear": false,
        "sparsify": null
      }
    },

@gghfez you should try it too if you already have the files.

NVIDIA org

The "ffn_mult": null setting can be used with either no_op: true or replace_with_linear: true. In this model, it is always used alongside no_op: true, which means the feed-forward (ffn/mlp) component of the transformer block is completely skipped, including the preceding layer normalization. This applies regardless of the attention mechanism used earlier in the block.

I want to understand more and I study againπŸ˜…πŸ™

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment