autoprogrammer
/

deepseekv2lite_densemixer

Model card Files Files and versions

autoprogrammer commited on 24 days ago

Commit

49ad088

·

verified ·

1 Parent(s): f10caa3

Upload DeepSeekV2Lite DenseMixer model

DeepSeekV2Lite model with DenseMixer architecture

Files changed (1) hide show

modeling_deepseek.py +1 -1

modeling_deepseek.py CHANGED Viewed

@@ -666,7 +666,7 @@ class DeepseekV2MoE(nn.Module):
                 matches = (topk_idx == expert_idx)
                 if matches.any():
                     token_indices, k_indices = torch.where(matches)
-                    weights_topk = topk_weight[token_indices, k_indices].unsqueeze(-1)  # (num_matches, 1)
                     sparse_outputs[token_indices] = sparse_outputs[token_indices] + expert_output[token_indices] * weights_topk
         else:
             # 推理模式：使用原始的稀疏计算逻辑

                 matches = (topk_idx == expert_idx)
                 if matches.any():
                     token_indices, k_indices = torch.where(matches)
+                    weights_topk = topk_weight[token_indices, k_indices].unsqueeze(-1).to(sparse_outputs.dtype)  # (num_matches, 1)
                     sparse_outputs[token_indices] = sparse_outputs[token_indices] + expert_output[token_indices] * weights_topk
         else:
             # 推理模式：使用原始的稀疏计算逻辑