"ffn_mult": null,

by csabakecskemeti - opened 8 days ago

Discussion

csabakecskemeti

8 days ago

I saw some "ffn_mult": null, in the config, is that intended or means 1.0?

bartowski

8 days ago

+1 on this, llamacpp needs to know the ffn_mult for those items..

@Chris-Alexiuk @jiaqiz

gghfez

8 days ago

llamacpp needs to know the ffn_mult for those items..

There isn't one though. If I understand it correctly, forward-feed is disabled ( see "no_op":true,), so these blocks are effectively skipped?

,
    {
      "attention": {
        "n_heads_in_group": null,
        "no_op": true, 👈
        "num_sink_tokens": null,
        "replace_with_linear": false,
        "sparsify": null,
        "unshifted_sink": false,
        "use_prefill_window_in_sink_attention": false,
        "window_length": null
      },
      "ffn": {
        "ffn_mult": null, 👈
        "no_op": true,
        "replace_with_linear": false,
        "sparsify": null
      }
    }

bartowski

8 days ago

You may be right, may be that we need an update in llama.cpp then 🤔

csabakecskemeti

8 days ago

•

edited 8 days ago

I've tried to default it to "ffn_mult": 1.0 last night. llama.cpp technically produced the files but models cannot be loaded:
"missing tensor: 'blk.9.ffn_norm.weight'" so the no_op theory makes sense!?

gghfez

8 days ago

Damn, I'm doing the same (files are being produced)
Haven't had a chance to load it yet but no doubt I'll encounter the same thing :(

csabakecskemeti

8 days ago

•

edited 8 days ago

There are some attentions shere "no_op": true but "ffn_mult": 1.95, is set:

{
      "attention": {
        "n_heads_in_group": null,
        "no_op": true,
        "num_sink_tokens": null,
        "replace_with_linear": false,
        "sparsify": null,
        "unshifted_sink": false,
        "use_prefill_window_in_sink_attention": false,
        "window_length": null
      },
      "ffn": {
        "ffn_mult": 1.95,
        "no_op": false,
        "replace_with_linear": false,
        "sparsify": null
      }
    },

csabakecskemeti

8 days ago

•

edited 8 days ago

@gghfez you should try it too if you already have the files.

izikg

NVIDIA org 7 days ago

The "ffn_mult": null setting can be used with either no_op: true or replace_with_linear: true. In this model, it is always used alongside no_op: true, which means the feed-forward (ffn/mlp) component of the transformer block is completely skipped, including the preceding layer normalization. This applies regardless of the attention mechanism used earlier in the block.

RI3R8

6 days ago

I want to understand more and I study again😅🙏

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment