aws-neuron
/

optimum-neuron-cache

dacorvo HF Staff commited on Sep 26, 2024

Commit

5694f75

verified ·

1 Parent(s): 89d090e

Update inference-cache-config/llama3-70b.json

Files changed (1) hide show

inference-cache-config/llama3-70b.json CHANGED Viewed

@@ -4,13 +4,13 @@
       "batch_size": 1,
       "sequence_length": 4096,
       "num_cores": 24,
-      "auto_cast_type": "fp16"
     },
     {
       "batch_size": 4,
       "sequence_length": 4096,
       "num_cores": 24,
-      "auto_cast_type": "fp16"
     }
   ]
 }

       "batch_size": 1,
       "sequence_length": 4096,
       "num_cores": 24,
+      "auto_cast_type": "bf16"
     },
     {
       "batch_size": 4,
       "sequence_length": 4096,
       "num_cores": 24,
+      "auto_cast_type": "bf16"
     }
   ]
 }