Update modeling_plamo.py

seq len が attention window size と同一なとき、attention mask を作らずに forward できるはずですが、現状 require_attn_mask の条件が厳しすぎるため
https://huggingface.co/pfnet/plamo-2-1b/blob/main/modeling_plamo.py#L1120
の条件の not に対応するようにしました

Files changed (1) hide show

modeling_plamo.py +2 -2

modeling_plamo.py CHANGED Viewed

@@ -1434,7 +1434,7 @@ class Plamo2Model(Plamo2PreTrainedModel):
         require_attn_mask = False
         if not self.training or past_key_values is not None:
             require_attn_mask = True
-        if seq_length_with_past >= self.config.attention_window_size:
             require_attn_mask = True
         if require_attn_mask and attention_mask is None:
             attention_mask = torch.ones(
@@ -1704,4 +1704,4 @@ class Bias(nn.Module):
         self,
         x: torch.Tensor,
     ) -> torch.Tensor:
-        return x + self._bias

         require_attn_mask = False
         if not self.training or past_key_values is not None:
             require_attn_mask = True
+        if seq_length_with_past > self.config.attention_window_size + 1:
             require_attn_mask = True
         if require_attn_mask and attention_mask is None:
             attention_mask = torch.ones(
         self,
         x: torch.Tensor,
     ) -> torch.Tensor:
+        return x + self._bias