|
2025-08-09T18:44:43Z INFO 67541 [root]: /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/bin/neuronx-cc compile /home/ubuntu/qwen3/layout_opt/model/graph.hlo --framework XLA --target trn1 --output /home/ubuntu/qwen3/layout_opt/graph.neff --model-type=transformer -O1 --lnc=1 '--internal-hlo2tensorizer-options=--experimental-unsafe-fp8e4m3fn-as-fp8e4m3 --verify-hlo=false' --logfile=/home/ubuntu/qwen3/layout_opt/log-neuron-cc.txt --verbose=35 |
|
2025-08-09T18:44:43Z INFO 67541 [root]: NeuronX Compiler version 2.20.9961.0+0acef03a Python version 3.10.12 HWM version 2.20.0.9961+0acef03a NumPy version 1.26.4 Running on AMI ami-040348201d80b58ad Running in region usw2-az4 |
|
2025-08-09T18:44:43Z INFO 67605 [root]: XLA detected |
|
2025-08-09T18:44:43Z INFO 67605 [root]: Pipeline: HLOToTensorizer Frontend StaticIOTranspose WalrusDriver BIRLinker Kelper NeffWrapper |
|
2025-08-09T18:44:44Z INFO 67605 [root]: Intermediate files stored in /home/ubuntu/neuronxcc-mk9kpjyq, output in /home/ubuntu |
|
2025-08-09T18:44:44Z INFO 67605 [pipeline.Pipeline.0]: Job Pipeline len(in_states) 1 |
|
2025-08-09T18:44:44Z INFO 67605 [pipeline.Pipeline.0]: Processing input #0 |
|
2025-08-09T18:44:44Z INFO 67605 [pipeline.Pipeline.0]: Running pipeline Pipeline.0 |
|
2025-08-09T18:44:44Z INFO 67605 [pipeline.Pipeline.0]: Starting job job.HLOToTensorizer.0 |
|
2025-08-09T18:44:44Z INFO 67605 [job.HLOToTensorizer.0]: Job HLOToTensorizer len(in_states) 1 |
|
2025-08-09T18:44:44Z INFO 67605 [job.HLOToTensorizer.0]: Processing input #0 |
|
2025-08-09T18:44:44Z INFO 67605 [job.HLOToTensorizer.0]: IR signature: 12b45b028e502b2dd8c42c1287fbdbea434454143a30d473806853bc18673d98 for graph.hlo |
|
2025-08-09T18:44:44Z INFO 67605 [job.HLOToTensorizer.0]: Executing: /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/starfish/bin/hlo2penguin --input /home/ubuntu/qwen3/layout_opt/model/graph.hlo --out-dir ./ --output penguin.py --remat --max-costly-ops=2 --max-live-in-size=5 --max-remat-chain-size=10 --max-mem-multiple=1.8 --min-def-use-distance=500 --remat-policy=transformer --allow-same-pass-remat=true --layers-per-module=1 --partition --emit-tensor-level-dropout-ops --experimental-unsafe-fp8e4m3fn-as-fp8e4m3 --verify-hlo=false --native-to-custom-softmax --partitioner-opts='--transformer' |
|
2025-08-09T18:44:44Z INFO 67605 [job.HLOToTensorizer.0]: DEBUG: needsModular? No. macCnt 0 num non-trivial Ops 325 |
|
INFO: Switching to single-module compile. PrePartitionPipe skipped. |
|
INFO: Found memory bound graph |
|
INFO: Number of Native SoftmaxDx's detected and replaced: 0 |
|
INFO: Number of Native Softmax's detected and replaced: 0 |
|
Replaced 0 dropout sequences with OffloadedDropout |
|
INFO: HloMacCount has found 0 |
|
INFO: Traffic has found 8191043584 |
|
INFO: AIF 0 |
|
HLO Ops used in computation: parameter reshape transpose tuple |
|
Warning: Could not open file debug_info_hlo_partitions.json |
|
2025-08-09 18:44:44.153865: W hilo/hlo2penguin/utils/DumpDebugInfo.cc:52] Truncating long HLO operator name %last = tuple(%p76, %transpose.325, %transpose.326, %transpose.327, %p80, %transpose.328, %p82, %transpose.329, %transpose.330, %transpose.331, %transpose.332, %transpose.333, %transpose.334, %transpose.335, %transpose.336, %p91, %transpose.337, %p93, %transpose.338, %transpose.339, %transpose.340, %transpose.341, %transpose.342, %transpose.343, %transpose.344, %transpose.345, %p102, %transpose.346, %p104, %transpose.347, %transpose.348, %transpose.349, %transpose.350, %transpose.351, %transpose.352, %tr... to 512 characters in the compiler's debug metadata |
|
Invoking RemoveOptimizationBarriers pass |
|
|
|
2025-08-09T18:44:44Z INFO 67605 [job.HLOToTensorizer.0]: IR signature: 5bb2cda84f89e3e556843403ea05d6d67130299dc9a1fbfc964c0d386a78e543 for sg0000/HLOToTensorizer |
|
2025-08-09T18:44:44Z INFO 67605 [job.HLOToTensorizer.0]: Job #0 finished |
|
2025-08-09T18:44:44Z INFO 67605 [pipeline.Pipeline.0]: Finished job job.HLOToTensorizer.0 |
|
2025-08-09T18:44:44Z INFO 67605 [pipeline.Pipeline.0]: Starting job job.Frontend.0 |
|
2025-08-09T18:44:44Z INFO 67605 [job.Frontend.0]: Job Frontend len(in_states) 1 |
|
2025-08-09T18:44:44Z INFO 67605 [job.Frontend.0]: Processing input #0 |
|
2025-08-09T18:44:44Z INFO 67605 [job.Frontend.0]: Start model loading |
|
2025-08-09T18:44:44Z INFO 67605 [job.Frontend.0]: Start tensorization |
|
2025-08-09T18:44:44Z INFO 67605 [job.Frontend.0]: Num jobs: 128 |
|
2025-08-09T18:44:44Z USER 67605 [root/Tensorizer/Tensorizer]: Running Tensorizer |
|
2025-08-09T18:44:44Z INFO 67605 [Tensorizer]: Frontend did not find netlist info. Switching to flat flow. |
|
2025-08-09T18:44:44Z INFO 67605 [Tensorizer]: Building model from Penguin script "penguin.py"... |
|
2025-08-09T18:44:44Z INFO 67605 [Tensorizer]: Tensorizer options: --run-pg-layout-and-tiling --enable-dse-after-mask-propagation --disable-concat-delinearizer --num-neuroncores-per-sengine=1 --num-neuroncores-per-sengine=1 --internal_dynamic_dma_scratch_size_per_partition=16384 --disable-bitcasted-transpose --dont-verify-after-all --fp32-cast=matmult-bf16 --mm-transpose-type=fp32 --disable-expensive-checks --disable-max-stride-tiling --hbm-scratchpad-page-size-in-bytes=536870912 --enable-replication --max-local-tensor-tile-size-in-bytes=32768 --tensor-layout-p-order=0 --tensor-layout-b-order=1 --enable-advanced-delinearization --weight-coalescing-threshold=512 --enable-bir-converter=enable --enable-tritium-loopfusion --enable-softmax-kernel --model-type-transformer --enable-isl-in-injective-check --enable-dge-on-io-dma --enable-dge-on-indirect-dma --enable-dge-on-vector-indirect-dma --keep-rng-tensor-op |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/DoNothing]: Running DoNothing |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/DoNothing]: Finished (changed=True) |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/LegalizeOpLevelAlias]: Running LegalizeOpLevelAlias |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/LegalizeOpLevelAlias]: Finished (changed=False) |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/LegalizeOpLevelAlias]: LegalizeOpLevelAlias finished after 0.004 seconds |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: Running OptimizeAliasedCopyChain |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: Finished (changed=False) |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: OptimizeAliasedCopyChain finished after 0.006 seconds |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=True) |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.037 seconds |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/TransformConvOp]: Running TransformConvOp |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/TransformConvOp]: Finished (changed=False) |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/TransformConvOp]: TransformConvOp finished after 0.014 seconds |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/LowerTensorOp]: Running LowerTensorOp |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/LowerTensorOp]: Finished (changed=False) |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/LowerTensorOp]: LowerTensorOp finished after 0.005 seconds |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=True) |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.003 seconds |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=True) |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.037 seconds |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.049 seconds |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/TensorOpSimplifier]: Running TensorOpSimplifier |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/TensorOpSimplifier]: Finished (changed=False) |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/TensorOpSimplifier]: TensorOpSimplifier finished after 0.019 seconds |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/CanonicalizeIR]: Running CanonicalizeIR |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/CanonicalizeIR]: Finished (changed=False) |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/CanonicalizeIR]: CanonicalizeIR finished after 0.004 seconds |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/LegalizeCCOpLayout]: Running LegalizeCCOpLayout |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/LegalizeCCOpLayout]: Finished (changed=False) |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/LegalizeCCOpLayout]: LegalizeCCOpLayout finished after 0.005 seconds |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/ResolveComplicatePredicates]: Running ResolveComplicatePredicates |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/ResolveComplicatePredicates]: Finished (changed=False) |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/ResolveComplicatePredicates]: ResolveComplicatePredicates finished after 0.004 seconds |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/AffinePredicateResolution]: Running AffinePredicateResolution |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/AffinePredicateResolution]: Finished (changed=False) |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/AffinePredicateResolution]: AffinePredicateResolution finished after 0.004 seconds |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/EliminateDivs]: Running EliminateDivs |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/EliminateDivs]: Finished (changed=False) |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.004 seconds |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/PerfectLoopNest]: Finished (changed=False) |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.004 seconds |
|
2025-08-09T18:44:44Z INFO 67605 [sg0000/Tensorizer/Simplifier]: Running Simplifier |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Simplifier]: Finished (changed=True) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.070 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.004 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/TCTransform]: Running TCTransform |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/TCTransform]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.004 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/CommuteConcat]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.004 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/ExpandBatchNorm]: Running ExpandBatchNorm |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/ExpandBatchNorm]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/ExpandBatchNorm]: ExpandBatchNorm finished after 0.007 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/TCTransform]: Running TCTransform |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/TCTransform]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.004 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/EliminateDivs]: Running EliminateDivs |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/EliminateDivs]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.004 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.004 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/TensorOpTransform]: Running TensorOpTransform |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/TensorOpTransform]: Finished (changed=True) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/TensorOpTransform]: TensorOpTransform finished after 0.077 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LateLowerTensorOp]: Running LateLowerTensorOp |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LateLowerTensorOp]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LateLowerTensorOp]: LateLowerTensorOp finished after 0.006 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.007 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.014 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/MemcpyElimination]: Running MemcpyElimination |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/MemcpyElimination]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/MemcpyElimination]: MemcpyElimination finished after 0.002 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LoopFusion]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.003 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Rematerialization]: Running Rematerialization |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Rematerialization]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Rematerialization]: Rematerialization finished after 0.003 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Simplifier]: Running Simplifier |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.001 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Delinearization]: Running Delinearization |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.002 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DeadStoreElimination]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.002 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Simplifier]: Running Simplifier |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.001 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LICM]: Running LICM |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LICM]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LICM]: LICM finished after 0.002 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Delinearization]: Running Delinearization |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.002 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LoopFusion]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.003 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/SimplifySlice]: Running SimplifySlice |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/SimplifySlice]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/SimplifySlice]: SimplifySlice finished after 0.002 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LICM]: Running LICM |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LICM]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LICM]: LICM finished after 0.001 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Simplifier]: Running Simplifier |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.002 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/ValueNumbering]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.001 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LICM]: Running LICM |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LICM]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LICM]: LICM finished after 0.002 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/PadElimination]: Running PadElimination |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/PadElimination]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/PadElimination]: PadElimination finished after 0.001 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Delinearization]: Running Delinearization |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.002 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LoopFusion]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.003 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Simplifier]: Running Simplifier |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.002 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LICM]: Running LICM |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LICM]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LICM]: LICM finished after 0.001 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/ValueNumbering]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.001 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/TCTransform]: Running TCTransform |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/TCTransform]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.001 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/CommuteConcat]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.001 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/RecognizeOpIdiom]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom finished after 0.002 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/MaskPropagation]: Running MaskPropagation |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/MaskPropagation]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.003 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DeadStoreElimination]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.002 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Recompute]: Running Recompute |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Recompute]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Recompute]: Recompute finished after 0.000 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DeadCodeElimination]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.002 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [Tensorizer]: After optimization: 325 statements |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DoNothing]: Running DoNothing |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DoNothing]: Finished (changed=True) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/MutateDataType]: Running MutateDataType |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/MutateDataType]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/MutateDataType]: MutateDataType finished after 0.003 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/AutoCastTCInputs]: Running AutoCastTCInputs |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/AutoCastTCInputs]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/AutoCastTCInputs]: AutoCastTCInputs finished after 0.004 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Simplifier]: Running Simplifier |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.001 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.002 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Delinearization]: Running Delinearization |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.002 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.003 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DeadCodeElimination]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.002 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LateLowerReshapeOp]: Running LateLowerReshapeOp |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LateLowerReshapeOp]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LateLowerReshapeOp]: LateLowerReshapeOp finished after 0.004 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/InferIntrinsicOnCC]: Running InferIntrinsicOnCC |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/InferIntrinsicOnCC]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/InferIntrinsicOnCC]: InferIntrinsicOnCC finished after 0.037 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/ResolveAccessConflict]: Running ResolveAccessConflict |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/ResolveAccessConflict]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/ResolveAccessConflict]: ResolveAccessConflict finished after 0.005 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LICM]: Running LICM |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LICM]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LICM]: LICM finished after 0.001 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LocalLayoutOpt]: Running LocalLayoutOpt |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LocalLayoutOpt]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LocalLayoutOpt]: LocalLayoutOpt finished after 0.009 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.002 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/PGLayoutTilingPipeline]: Running PGLayoutTilingPipeline |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LayoutPreprocessingAndAnalysis]: Running LayoutPreprocessingAndAnalysis |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LayoutPreprocessing]: Running LayoutPreprocessing |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Delinearization]: Running Delinearization |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.002 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LayoutPreprocessing]: Finished (changed=True) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LayoutPreprocessing]: LayoutPreprocessing finished after 0.022 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LayoutRequirementAnalysis]: Running LayoutRequirementAnalysis |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LayoutRequirementAnalysis]: LayoutRequirementAnalysis finished after 0.006 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LayoutPreprocessingAndAnalysis]: LayoutPreprocessingAndAnalysis finished after 0.036 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/InferNonlocalTensors]: Running InferNonlocalTensors |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/InferNonlocalTensors]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/InferNonlocalTensors]: InferNonlocalTensors finished after 0.022 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/PAGLayoutOpt]: Running PAGLayoutOpt |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/ParAxesAnnotation]: Running ParAxesAnnotation |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LayoutSearchAlgorithm]: prefer_non_broadcast_par: True |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/ParAxesAnnotation]: Finished (changed=True) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/ParAxesAnnotation]: ParAxesAnnotation finished after 0.044 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/InsertLocalTransposes]: Running InsertLocalTransposes |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/InsertLocalTransposes]: Finished (changed=True) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/InsertLocalTransposes]: InsertLocalTransposes finished after 0.005 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/PAGLayoutOpt]: PAGLayoutOpt finished after 0.059 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/MaskPropagation]: Running MaskPropagation |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/MaskPropagation]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.003 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: Running CanonicalizeDAGForPGTiling |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: CanonicalizeDAGForPGTiling finished after 0.003 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/PGTiling]: Running PGTiling |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/AGOrderingAnalysisPass]: Running AGOrderingAnalysisPass |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/AGOrderingAnalysisPass]: AGOrderingAnalysisPass finished after 0.029 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/StaticTransposeLocalTensor]: Running StaticTransposeLocalTensor |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/StaticTransposeLocalTensor]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/StaticTransposeLocalTensor]: StaticTransposeLocalTensor finished after 0.003 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/PComputeCutting]: Running PComputeCutting |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/PComputeCutting]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/PComputeCutting]: PComputeCutting finished after 0.008 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/BFComputeCutting]: Running BFComputeCutting |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/BFComputeCutting]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/BFComputeCutting]: BFComputeCutting finished after 0.004 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LoopSplitting]: Running LoopSplitting |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LoopSplitting]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/LoopSplitting]: LoopSplitting finished after 0.001 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/MacroGeneration]: Running MacroGeneration |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/MacroGeneration]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/MacroGeneration]: MacroGeneration finished after 0.025 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/PGTiling]: PGTiling finished after 0.090 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/InsertIOTransposes]: Running InsertIOTransposes |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/InsertIOTransposes]: Finished (changed=True) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/InsertIOTransposes]: InsertIOTransposes finished after 0.003 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/InsertOffloadedTransposes]: Running InsertOffloadedTransposes |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/InsertOffloadedTransposes]: Finished (changed=False) |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/InsertOffloadedTransposes]: InsertOffloadedTransposes finished after 0.001 seconds |
|
2025-08-09T18:44:45Z INFO 67605 [sg0000/Tensorizer/DramToDramTranspose]: Running DramToDramTranspose |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/DramToDramTranspose]: Finished (changed=True) |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/DramToDramTranspose]: DramToDramTranspose finished after 18.575 seconds |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/PGLayoutTilingPipeline]: PGLayoutTilingPipeline finished after 18.825 seconds |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingProfiler]: Running TilingProfiler |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: |
|
20 MACROS WITH LARGEST INSTRUCTION COUNTS: |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingProfiler]: Finished (changed=False) |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/TilingProfiler]: TilingProfiler finished after 0.208 seconds |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.121 seconds |
|
2025-08-09T18:45:04Z INFO 67605 [sg0000/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor |
|
2025-08-09T18:45:05Z INFO 67605 [sg0000/Tensorizer/InferNeuronTensor]: Finished (changed=True) |
|
2025-08-09T18:45:05Z INFO 67605 [sg0000/Tensorizer/InferNeuronTensor]: InferNeuronTensor finished after 0.824 seconds |
|
2025-08-09T18:45:05Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier |
|
2025-08-09T18:45:05Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) |
|
2025-08-09T18:45:05Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.123 seconds |
|
2025-08-09T18:45:05Z INFO 67605 [sg0000/Tensorizer/LICM]: Running LICM |
|
2025-08-09T18:45:05Z INFO 67605 [sg0000/Tensorizer/LICM]: Finished (changed=False) |
|
2025-08-09T18:45:05Z INFO 67605 [sg0000/Tensorizer/LICM]: LICM finished after 0.036 seconds |
|
2025-08-09T18:45:05Z INFO 67605 [sg0000/Tensorizer/RewriteReplicationMatmul]: Running RewriteReplicationMatmul |
|
2025-08-09T18:45:05Z INFO 67605 [sg0000/Tensorizer/RewriteReplicationMatmul]: Finished (changed=False) |
|
2025-08-09T18:45:05Z INFO 67605 [sg0000/Tensorizer/RewriteReplicationMatmul]: RewriteReplicationMatmul finished after 0.029 seconds |
|
2025-08-09T18:45:05Z INFO 67605 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop |
|
2025-08-09T18:45:05Z INFO 67605 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=False) |
|
2025-08-09T18:45:05Z INFO 67605 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.088 seconds |
|
2025-08-09T18:45:05Z INFO 67605 [sg0000/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates |
|
2025-08-09T18:45:05Z INFO 67605 [sg0000/Tensorizer/SimplifyMacroPredicates]: Finished (changed=False) |
|
2025-08-09T18:45:05Z INFO 67605 [sg0000/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.094 seconds |
|
2025-08-09T18:45:05Z INFO 67605 [sg0000/Tensorizer/DataLocalityOpt]: Running DataLocalityOpt |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/DataLocalityOpt]: Finished (changed=True) |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/DataLocalityOpt]: DataLocalityOpt finished after 0.189 seconds |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/DMATilingProfiler]: Running DMATilingProfiler |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: |
|
20 MACROS WITH LARGEST INSTRUCTION COUNTS: |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: transpose_128x128 |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/DMATilingProfiler]: Finished (changed=False) |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/DMATilingProfiler]: DMATilingProfiler finished after 0.034 seconds |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.130 seconds |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/LegalizeSundaMacro]: Running LegalizeSundaMacro |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/LegalizeSundaMacro]: Finished (changed=False) |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/LegalizeSundaMacro]: LegalizeSundaMacro finished after 0.064 seconds |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.130 seconds |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PerfectLoopNest]: Finished (changed=False) |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.027 seconds |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.096 seconds |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/RewriteWeights]: Running RewriteWeights |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/RewriteWeights]: Finished (changed=False) |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/RewriteWeights]: RewriteWeights finished after 0.023 seconds |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/ReshapeWeights]: Running ReshapeWeights |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/ReshapeWeights]: Finished (changed=False) |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/ReshapeWeights]: ReshapeWeights finished after 0.007 seconds |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=False) |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.080 seconds |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/SimplifyMacroPredicates]: Finished (changed=False) |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.098 seconds |
|
2025-08-09T18:45:06Z INFO 67605 [sg0000/Tensorizer/InferInitValue]: Running InferInitValue |
|
2025-08-09T18:45:07Z INFO 67605 [sg0000/Tensorizer/InferInitValue]: Finished (changed=True) |
|
2025-08-09T18:45:07Z INFO 67605 [sg0000/Tensorizer/InferInitValue]: InferInitValue finished after 0.433 seconds |
|
2025-08-09T18:45:07Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier |
|
2025-08-09T18:45:07Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) |
|
2025-08-09T18:45:07Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.130 seconds |
|
2025-08-09T18:45:07Z INFO 67605 [sg0000/Tensorizer/SimplifyTensor]: Running SimplifyTensor |
|
2025-08-09T18:45:07Z INFO 67605 [sg0000/Tensorizer/SimplifyTensor]: Finished (changed=False) |
|
2025-08-09T18:45:07Z INFO 67605 [sg0000/Tensorizer/SimplifyTensor]: SimplifyTensor finished after 0.081 seconds |
|
2025-08-09T18:45:07Z INFO 67605 [sg0000/Tensorizer/LICM]: Running LICM |
|
2025-08-09T18:45:07Z INFO 67605 [sg0000/Tensorizer/LICM]: Finished (changed=False) |
|
2025-08-09T18:45:07Z INFO 67605 [sg0000/Tensorizer/LICM]: LICM finished after 0.037 seconds |
|
2025-08-09T18:45:07Z INFO 67605 [sg0000/Tensorizer/SundaISel]: Running SundaISel |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/SundaISel]: Finished (changed=True) |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/SundaISel]: SundaISel finished after 0.549 seconds |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronAliasDependencyReset]: Running NeuronAliasDependencyReset |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=False) |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: Running NeuronAliasDependencyInduction |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: Finished (changed=True) |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: NeuronAliasDependencyInduction finished after 0.041 seconds |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronAliasDependencyReset]: NeuronAliasDependencyReset finished after 0.049 seconds |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/LowerComplexBroadcast]: Running LowerComplexBroadcast |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/LowerComplexBroadcast]: Finished (changed=False) |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/LowerComplexBroadcast]: LowerComplexBroadcast finished after 0.027 seconds |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.024 seconds |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.017 seconds |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronLoopFusion]: Finished (changed=True) |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion finished after 0.090 seconds |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.022 seconds |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=False) |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.083 seconds |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/FactorizeBlkDims]: Finished (changed=False) |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.113 seconds |
|
2025-08-09T18:45:08Z INFO 67605 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/NeuronInstComb]: Finished (changed=True) |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 1.604 seconds |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/NeuronValueNumbering]: Finished (changed=False) |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.045 seconds |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/NeuronInstComb]: Finished (changed=False) |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.020 seconds |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/VectorizeDMA]: Running VectorizeDMA |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/VectorizeDMA]: Finished (changed=False) |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/VectorizeDMA]: VectorizeDMA finished after 0.030 seconds |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.011 seconds |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/LegalizePartitionReduce]: Running LegalizePartitionReduce |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/LegalizePartitionReduce]: Finished (changed=False) |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/LegalizePartitionReduce]: LegalizePartitionReduce finished after 0.010 seconds |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/DeConcat]: Running DeConcat |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/DeConcat]: Finished (changed=False) |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/DeConcat]: DeConcat finished after 0.002 seconds |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: Running FactorizeThreadAxesInFreeDims |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: Finished (changed=False) |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: FactorizeThreadAxesInFreeDims finished after 0.020 seconds |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/PartialSimdFusion]: Finished (changed=False) |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/PartialSimdFusion]: PartialSimdFusion finished after 0.009 seconds |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/TritiumFusion]: Running TritiumFusion |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/TritiumFusion]: Finished (changed=False) |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/TritiumFusion]: TritiumFusion finished after 0.010 seconds |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/CCOpFusion]: Finished (changed=False) |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.081 seconds |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/VectorizeMatMult]: Running VectorizeMatMult |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/VectorizeMatMult]: Finished (changed=False) |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/VectorizeMatMult]: VectorizeMatMult finished after 0.005 seconds |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/PartialLoopFusion]: Finished (changed=False) |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/PartialLoopFusion]: PartialLoopFusion finished after 0.154 seconds |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=False) |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.048 seconds |
|
2025-08-09T18:45:10Z INFO 67605 [sg0000/Tensorizer/LowerTranspose]: Running LowerTranspose |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/LowerTranspose]: Finished (changed=True) |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.491 seconds |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/LowerBroadcast]: Running LowerBroadcast |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/LowerBroadcast]: Finished (changed=False) |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.019 seconds |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/LateNeuronInstComb]: Finished (changed=True) |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.128 seconds |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/SplitAccGrp]: Running SplitAccGrp |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/SplitAccGrp]: Finished (changed=False) |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/SplitAccGrp]: SplitAccGrp finished after 0.015 seconds |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/SpillPSum]: Running SpillPSum |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/SpillPSum]: Finished (changed=False) |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/SpillPSum]: SpillPSum finished after 0.150 seconds |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/LowerIntrinsics]: Finished (changed=False) |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.018 seconds |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/InlineNativeKernels]: Running InlineNativeKernels |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/InlineNativeKernels]: Finished (changed=False) |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/InlineNativeKernels]: InlineNativeKernels finished after 0.015 seconds |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/LegalizeType]: Running LegalizeType |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/LegalizeType]: Finished (changed=True) |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/LegalizeType]: LegalizeType finished after 0.104 seconds |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=False) |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.074 seconds |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/InferPSumTensor]: Running InferPSumTensor |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/InferPSumTensor]: Finished (changed=False) |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.176 seconds |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/WeightCoalescing]: Running WeightCoalescing |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/WeightCoalescing]: Finished (changed=False) |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.015 seconds |
|
2025-08-09T18:45:11Z INFO 67605 [sg0000/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/LegalizeSundaAccess]: Finished (changed=False) |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.145 seconds |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/RelaxPredicates]: Running RelaxPredicates |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/RelaxPredicates]: Finished (changed=False) |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/RelaxPredicates]: RelaxPredicates finished after 0.039 seconds |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/TensorInitialization]: Running TensorInitialization |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/TensorInitialization]: Finished (changed=False) |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/TensorInitialization]: TensorInitialization finished after 0.017 seconds |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.017 seconds |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/ExpandISAMacro]: Finished (changed=False) |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.034 seconds |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.060 seconds |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/DMALocalityOpt]: Finished (changed=False) |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.012 seconds |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/DataStreaming]: Running DataStreaming |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/DataStreaming]: Finished (changed=False) |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/DataStreaming]: DataStreaming finished after 0.033 seconds |
|
2025-08-09T18:45:12Z INFO 67605 [sg0000/Tensorizer/SFKVectorizer]: Running SFKVectorizer |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/SFKVectorizer]: Finished (changed=True) |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 3.184 seconds |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/LateLegalizeInst]: Finished (changed=False) |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.066 seconds |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/CoalesceCCOp]: Finished (changed=False) |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.018 seconds |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.018 seconds |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/DMAProfiler]: Running DMAProfiler |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 0.465% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[2] bfloat16 (32, 2, 128, 3072) %'20894.27130'[T_i0,T_i2_29578,i0.128,i1.3072] = load bfloat16<128 x 3072> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3072) %'input8'[T_i0,i0.128,T_i2_29578,i1.3072] # id=25058, src_id=None, , instances=64 # dl = tensor_op_name: t2534_pftranspose_20894 | hlo_id: 1787 | [[i0.128];[i1.3072]] -> [[i0.128];[i1.3072]] |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 0.465% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[2] bfloat16 (32, 2, 128, 3072) %'20935.27144'[T_i0,T_i2_29586,i0.128,i1.3072] = load bfloat16<128 x 3072> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3072) %'input19'[T_i0,i0.128,T_i2_29586,i1.3072] # id=25116, src_id=None, , instances=64 # dl = tensor_op_name: t2597_pftranspose_20935 | hlo_id: 1805 | [[i0.128];[i1.3072]] -> [[i0.128];[i1.3072]] |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 0.465% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[2] bfloat16 (32, 2, 128, 3072) %'20976.27158'[T_i0,T_i2_29594,i0.128,i1.3072] = load bfloat16<128 x 3072> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3072) %'input30'[T_i0,i0.128,T_i2_29594,i1.3072] # id=25174, src_id=None, , instances=64 # dl = tensor_op_name: t2660_pftranspose_20976 | hlo_id: 1823 | [[i0.128];[i1.3072]] -> [[i0.128];[i1.3072]] |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 0.465% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[2] bfloat16 (32, 2, 128, 3072) %'21017.27172'[T_i0,T_i2_29602,i0.128,i1.3072] = load bfloat16<128 x 3072> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3072) %'input41'[T_i0,i0.128,T_i2_29602,i1.3072] # id=25232, src_id=None, , instances=64 # dl = tensor_op_name: t2723_pftranspose_21017 | hlo_id: 1841 | [[i0.128];[i1.3072]] -> [[i0.128];[i1.3072]] |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 0.465% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[2] bfloat16 (32, 2, 128, 3072) %'21058.27186'[T_i0,T_i2_29610,i0.128,i1.3072] = load bfloat16<128 x 3072> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3072) %'input52'[T_i0,i0.128,T_i2_29610,i1.3072] # id=25290, src_id=None, , instances=64 # dl = tensor_op_name: t2786_pftranspose_21058 | hlo_id: 1859 | [[i0.128];[i1.3072]] -> [[i0.128];[i1.3072]] |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 0.465% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[2] bfloat16 (32, 2, 128, 3072) %'21099.27200'[T_i0,T_i2_29618,i0.128,i1.3072] = load bfloat16<128 x 3072> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3072) %'input63'[T_i0,i0.128,T_i2_29618,i1.3072] # id=25348, src_id=None, , instances=64 # dl = tensor_op_name: t2849_pftranspose_21099 | hlo_id: 1877 | [[i0.128];[i1.3072]] -> [[i0.128];[i1.3072]] |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 0.465% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[2] bfloat16 (32, 2, 128, 3072) %'21140.27214'[T_i0,T_i2_29626,i0.128,i1.3072] = load bfloat16<128 x 3072> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3072) %'input74'[T_i0,i0.128,T_i2_29626,i1.3072] # id=25406, src_id=None, , instances=64 # dl = tensor_op_name: t2912_pftranspose_21140 | hlo_id: 1895 | [[i0.128];[i1.3072]] -> [[i0.128];[i1.3072]] |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 0.465% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[2] bfloat16 (32, 2, 128, 3072) %'21181.27228'[T_i0,T_i2_29634,i0.128,i1.3072] = load bfloat16<128 x 3072> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3072) %'input85'[T_i0,i0.128,T_i2_29634,i1.3072] # id=25464, src_id=None, , instances=64 # dl = tensor_op_name: t2975_pftranspose_21181 | hlo_id: 1913 | [[i0.128];[i1.3072]] -> [[i0.128];[i1.3072]] |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 0.465% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[2] bfloat16 (32, 2, 128, 3072) %'21222.27242'[T_i0,T_i2_29642,i0.128,i1.3072] = load bfloat16<128 x 3072> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3072) %'input96'[T_i0,i0.128,T_i2_29642,i1.3072] # id=25522, src_id=None, , instances=64 # dl = tensor_op_name: t3038_pftranspose_21222 | hlo_id: 1931 | [[i0.128];[i1.3072]] -> [[i0.128];[i1.3072]] |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 0.465% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[2] bfloat16 (32, 2, 128, 3072) %'21263.27256'[T_i0,T_i2_29650,i0.128,i1.3072] = load bfloat16<128 x 3072> {'CrossPassTensor': ''}bfloat16 (32, 128, 2, 3072) %'input107'[T_i0,i0.128,T_i2_29650,i1.3072] # id=25580, src_id=None, , instances=64 # dl = tensor_op_name: t3101_pftranspose_21263 | hlo_id: 1949 | [[i0.128];[i1.3072]] -> [[i0.128];[i1.3072]] |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/DMAProfiler]: Finished (changed=False) |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.033 seconds |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/OptimizeNKIKernels]: Running OptimizeNKIKernels |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/OptimizeNKIKernels]: Finished (changed=False) |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/OptimizeNKIKernels]: OptimizeNKIKernels finished after 0.017 seconds |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/CCOpFusion]: Finished (changed=True) |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.357 seconds |
|
2025-08-09T18:45:15Z INFO 67605 [sg0000/Tensorizer/StaticProfiler]: Running StaticProfiler |
|
2025-08-09T18:45:16Z WARNING 67605 [sg0000/Tensorizer/StaticProfiler]: matmul-based transposes inserted by penguin takes up 100.00 percent of all matmul computation |
|
2025-08-09T18:45:16Z INFO 67605 [sg0000/Tensorizer/StaticProfiler]: Finished (changed=False) |
|
2025-08-09T18:45:16Z INFO 67605 [sg0000/Tensorizer/StaticProfiler]: StaticProfiler finished after 0.041 seconds |
|
2025-08-09T18:45:16Z INFO 67605 [sg0000/Tensorizer/SplitAPUnionSets]: Running SplitAPUnionSets |
|
2025-08-09T18:45:16Z INFO 67605 [sg0000/Tensorizer/SplitAPUnionSets]: Finished (changed=True) |
|
2025-08-09T18:45:16Z INFO 67605 [sg0000/Tensorizer/SplitAPUnionSets]: SplitAPUnionSets finished after 0.154 seconds |
|
2025-08-09T18:45:16Z INFO 67605 [sg0000/Tensorizer/LateLegalizePostSplit]: Running LateLegalizePostSplit |
|
2025-08-09T18:45:16Z INFO 67605 [sg0000/Tensorizer/LateLegalizePostSplit]: Finished (changed=False) |
|
2025-08-09T18:45:16Z INFO 67605 [sg0000/Tensorizer/LateLegalizePostSplit]: LateLegalizePostSplit finished after 0.040 seconds |
|
2025-08-09T18:45:16Z INFO 67605 [sg0000/Tensorizer/DumpGraphAndMetadata]: Running DumpGraphAndMetadata |
|
2025-08-09T18:45:16Z INFO 67605 [sg0000/Tensorizer/DumpGraphAndMetadata]: Finished (changed=False) |
|
2025-08-09T18:45:16Z INFO 67605 [sg0000/Tensorizer/DumpGraphAndMetadata]: DumpGraphAndMetadata finished after 0.046 seconds |
|
2025-08-09T18:45:16Z INFO 67605 [sg0000/Tensorizer/ZeroSizeTensorElimination]: Running ZeroSizeTensorElimination |
|
2025-08-09T18:45:16Z INFO 67605 [sg0000/Tensorizer/ZeroSizeTensorElimination]: Finished (changed=False) |
|
2025-08-09T18:45:16Z INFO 67605 [sg0000/Tensorizer/ZeroSizeTensorElimination]: ZeroSizeTensorElimination finished after 0.001 seconds |
|
2025-08-09T18:45:16Z INFO 67605 [sg0000/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop |
|
2025-08-09T18:45:16Z INFO 67605 [sg0000/Tensorizer/BirCodeGenLoop]: Finished (changed=False) |
|
2025-08-09T18:45:16Z INFO 67605 [sg0000/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.665 seconds |
|
2025-08-09T18:45:17Z INFO 67605 [Tensorizer]: BirCodeGen estimate #instances=279978 in sg0000 |
|
2025-08-09T18:45:17Z INFO 67605 [Tensorizer]: IR signature: 4c500c33f6b410247d09546b05e57cdd552637593e5e9cae706f41ffd3eaadab for nc00/sg0000/TensorizerBIR |
|
2025-08-09T18:45:17Z INFO 67605 [Tensorizer]: Weights total number of bytes: 131072 |
|
2025-08-09T18:45:17Z INFO 67605 [Tensorizer]: Successfully built model. |
|
2025-08-09T18:45:17Z USER 67605 [root/Tensorizer/Tensorizer]: Tensorizer finished after 33.074 seconds |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: End tensorization |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input0 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input1 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input2 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input3 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input4 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input5 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input6 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input7 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input8 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input9 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input10 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input11 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input12 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input13 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input14 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input15 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input16 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input17 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input18 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input19 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input20 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input21 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input22 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input23 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input24 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input25 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input26 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input27 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input28 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input29 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input30 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input31 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input32 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input33 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input34 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input35 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input36 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input37 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input38 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input39 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input40 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input41 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input42 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input43 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input44 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input45 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input46 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input47 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input48 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input49 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input50 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input51 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input52 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input53 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input54 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input55 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input56 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input57 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input58 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input59 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input60 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input61 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input62 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input63 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input64 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input65 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input66 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input67 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input68 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input69 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input70 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input71 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input72 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input73 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input74 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input75 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input76 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input77 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input78 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input79 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input80 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input81 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input82 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input83 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input84 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input85 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input86 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input87 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input88 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input89 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input90 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input91 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input92 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input93 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input94 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input95 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input96 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input97 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input98 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input99 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input100 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input101 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input102 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input103 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input104 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input105 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input106 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input107 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input108 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input109 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input110 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input111 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input112 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input113 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input114 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input115 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input116 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input117 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input118 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input119 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input120 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input121 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input122 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input123 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input124 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input125 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input126 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input127 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input128 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input129 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input130 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input131 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input132 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input133 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input134 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input135 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input136 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input137 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input138 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input139 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input140 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input141 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input142 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input143 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input144 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input145 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input146 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input147 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input148 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input149 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input150 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input151 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input152 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input153 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input154 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input155 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input156 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input157 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input158 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input159 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input160 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input161 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input162 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input163 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input164 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input165 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input166 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input167 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input168 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input169 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input170 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input171 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input172 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input173 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input174 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input175 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input176 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input177 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input178 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input179 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input180 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input181 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input182 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input183 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input184 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input185 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input186 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input187 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input188 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input189 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input190 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input191 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input192 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input193 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input194 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input195 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input196 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input197 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input198 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input199 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input200 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input201 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input202 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input203 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input204 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input205 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input206 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input207 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input208 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input209 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input210 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input211 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input212 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input213 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input214 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input215 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input216 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input217 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input218 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input219 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input220 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input221 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input222 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input223 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input224 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input225 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input226 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input227 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input228 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input229 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input230 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input231 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input232 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input233 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input234 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input235 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input236 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input237 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input238 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input239 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input240 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input241 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input242 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input243 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input244 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input245 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input246 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input247 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input248 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input249 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input250 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input251 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input252 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input253 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input254 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input255 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input256 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input257 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input258 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input259 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input260 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input261 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input262 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input263 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input264 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input265 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input266 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input267 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input268 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input269 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input270 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input271 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input272 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input273 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input274 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input275 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input276 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input277 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input278 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input279 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input280 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input281 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input282 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input283 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input284 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input285 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input286 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input287 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input288 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input289 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input290 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input291 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input292 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input293 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input294 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input295 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input296 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input297 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input298 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input299 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input300 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input301 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input302 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input303 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input304 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input305 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input306 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input307 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input308 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input309 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input310 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input311 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input312 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input313 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input314 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input315 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input316 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input317 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input318 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input319 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input320 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input321 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input322 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input323 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input324 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input325 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input326 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input327 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input328 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input329 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input330 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input331 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input332 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input333 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input334 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input335 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input336 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input337 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input338 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input339 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input340 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input341 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input342 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input343 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input344 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input345 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input346 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input347 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input348 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input349 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input350 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input351 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input352 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input353 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input354 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input355 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input356 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input357 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input358 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input359 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input360 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input361 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input362 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input363 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input364 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input365 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input366 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input367 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input368 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input369 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input370 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input371 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input372 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input373 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input374 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input375 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input376 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input377 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input378 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input379 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input380 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input381 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input382 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input383 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input384 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input385 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input386 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input387 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input388 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input389 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input390 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input391 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input392 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input393 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input394 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input395 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input396 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input397 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Network input: input398 |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: wrote bir.json |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: wrote tensor_map.json |
|
2025-08-09T18:45:17Z INFO 67605 [job.Frontend.0]: Job #0 finished |
|
2025-08-09T18:45:17Z INFO 67605 [pipeline.Pipeline.0]: Finished job job.Frontend.0 |
|
2025-08-09T18:45:17Z INFO 67605 [pipeline.Pipeline.0]: Starting job job.StaticIOTranspose.0 |
|
2025-08-09T18:45:17Z INFO 67605 [pipeline.Pipeline.0]: Finished job job.StaticIOTranspose.0 |
|
2025-08-09T18:45:17Z INFO 67605 [pipeline.Pipeline.0]: Starting job job.WalrusDriver.0 |
|
2025-08-09T18:45:17Z INFO 67605 [job.WalrusDriver.0]: BackendDriver has 1 states with 1 core LNC |
|
2025-08-09T18:45:17Z INFO 67605 [job.WalrusDriver.0]: BackendDriver: no partitions found. Switching to flat flow. |
|
2025-08-09T18:45:17Z INFO 67605 [job.WalrusDriver.0]: Job WalrusDriver len(in_states) 1 |
|
2025-08-09T18:45:17Z INFO 67605 [job.WalrusDriver.0]: Processing input #0 |
|
2025-08-09T18:45:17Z INFO 67605 [job.WalrusDriver.0]: BackendDriver in_state.num_states 1 with 1 core LNC |
|
2025-08-09T18:45:17Z INFO 67605 [job.WalrusDriver.0]: Executing /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/starfish/bin/walrus_driver --optlevel 2 --allocator coloring --verbose 35 --logfile-verbose 20 --logfile /home/ubuntu/qwen3/layout_opt/log-neuron-cc.txt --execute-repetition 1 -i bir.json --min_split_size 10240 --skip_split_vns '' --no_split_dram --split_huge_dram_tensor 1.0 --preprocessing_only --max_tensorizer_distance 64 --pack_same_shape_only --instruction_fetch_latency 511 --max-partitions 1 --policy 3 --auxflag 0 --interleave none --schedule-delayed-latency 1 --postsched-mm-accum-reorder=false --max-load-color-rotation --max-load-lower-bound 0.14 --mm-reorder-opt --force-prefetch-follow-incoming-order -1 --allreduce-buffer-size 500 --dram-page-size 512 --dram-rotation-size -1 --allreduce-rotation-dis 8 --repeat-load-thres 4 --enable-mm-transpose-remat-optimization=true --save-len-thres 512 --save-dma-cnt-thres 32 --relaxed-order=true --enable-anti-dependence-reduction=false --num-semaphores-per-queue 16 --numcores 1 --act-root-json /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/pwp/pwp_bin_trainium/act_info.json --dve-root-json /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/dve/dve_bin_gen2/dve_info.json --unified-backend-and-legacy-codegen --tensor-map tensor_map.json --enable-verifier=true --enable-birsim=false --enable-birsim-sync-only=false --enable-data-race-checker=false --enable-new-backend=true --inject-error=NONE --dge-levels io,vector_dynamic_offsets,scalar_dynamic_offset --dynamic-dma-scratch-size-per-partition=16384 --neff-output-filename /home/ubuntu/qwen3/layout_opt/graph.neff |
|
2025-08-09T18:45:17Z INFO 67605 [job.WalrusDriver.0]: Working directory is /home/ubuntu/neuronxcc-mk9kpjyq/sg00 |
|
2025-08-09T18:45:17Z INFO 67605 [job.WalrusDriver.0]: propagate_exit=True |
|
2025-08-09T18:45:17Z INFO 67605 [job.WalrusDriver.0]: use_logger=False |
|
2025-08-09T18:45:17Z INFO 67605 [job.WalrusDriver.0]: expose_stderr=True |
|
2025-08-09T18:45:17Z INFO 67673 [Logging]: Logging to ../../qwen3/layout_opt/log-neuron-cc.txt at level 'INFO' |
|
2025-08-09T18:45:17Z INFO 67673 [BackendDriver]: max_allowed_parallelism=128 |
|
2025-08-09T18:45:18Z INFO 67673 [BackendDriver]: Backend driver mtBackend: false numModules: 1 Cwd: "/home/ubuntu/neuronxcc-mk9kpjyq/sg00" |
|
2025-08-09T18:45:18Z INFO 67673 [BackendDriver]: DynamicDMA is enabled |
|
2025-08-09T18:45:18Z INFO 67673 [BackendDriver]: DynamicDMA levels being enabled: io, scalar_dynamic_offset, vector_dynamic_offsets, |
|
2025-08-09T18:45:18Z USER 67673 [BackendPassManager]: Running mod_parallel_pass |
|
2025-08-09T18:45:18Z INFO 67673 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=1776 blocks=1 instructions=869 Max writers: 1 Max Readers: 325 |
|
2025-08-09T18:45:18Z USER 67673 [ModuleForkPass]: Running do_nothing |
|
2025-08-09T18:45:18Z INFO 67673 [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=1776 blocks=1 instructions=869 Max writers: 1 Max Readers: 325 |
|
2025-08-09T18:45:18Z USER 67673 [ModuleForkPass]: do_nothing finished after 0.003 seconds |
|
2025-08-09T18:45:18Z INFO 67673 [ModuleForkPass]: curr_vmrss: 177mb, ru_maxrss: 429mb (delta=0mb) |
|
2025-08-09T18:45:18Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1776 memory location(s), 1 block(s), and 869 instruction(s). Max writers: 1 Max Readers: 325 |
|
2025-08-09T18:45:18Z USER 67673 [ModuleForkPass]: Running birverifier |
|
2025-08-09T18:45:18Z INFO 67673 [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=1776 blocks=1 instructions=869 Max writers: 1 Max Readers: 325 |
|
2025-08-09T18:45:18Z USER 67673 [ModuleForkPass]: birverifier finished after 0.290 seconds |
|
2025-08-09T18:45:18Z INFO 67673 [ModuleForkPass]: curr_vmrss: 945mb, ru_maxrss: 945mb (delta=516mb) |
|
2025-08-09T18:45:18Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1776 memory location(s), 1 block(s), and 869 instruction(s). Max writers: 1 Max Readers: 325 |
|
2025-08-09T18:45:18Z USER 67673 [BackendPassManager]: mod_parallel_pass finished after 0.301 seconds |
|
2025-08-09T18:45:18Z INFO 67673 [BackendPassManager]: curr_vmrss: 945mb, ru_maxrss: 945mb (delta=516mb) |
|
2025-08-09T18:45:18Z INFO 67673 [BackendPassManager]: Output has 1 module(s), 1 function(s), 1776 memory location(s), 1 block(s), and 869 instruction(s). Max writers: 1 Max Readers: 325 |
|
2025-08-09T18:45:18Z USER 67673 [BackendPassManager]: Running subgraph_parallel_pass |
|
2025-08-09T18:45:18Z INFO 67673 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=1776 blocks=1 instructions=869 Max writers: 1 Max Readers: 325 |
|
2025-08-09T18:45:18Z USER 67673 [SubgraphForkPass]: Running lnc_verifier |
|
2025-08-09T18:45:18Z INFO 67673 [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=1776 blocks=1 instructions=869 Max writers: 1 Max Readers: 325 |
|
2025-08-09T18:45:18Z USER 67673 [SubgraphForkPass]: lnc_verifier finished after 0.001 seconds |
|
2025-08-09T18:45:18Z INFO 67673 [SubgraphForkPass]: curr_vmrss: 945mb, ru_maxrss: 945mb (delta=0mb) |
|
2025-08-09T18:45:18Z INFO 67673 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 1776 memory location(s), 1 block(s), and 869 instruction(s). Max writers: 1 Max Readers: 325 |
|
2025-08-09T18:45:18Z USER 67673 [BackendPassManager]: subgraph_parallel_pass finished after 0.004 seconds |
|
2025-08-09T18:45:18Z INFO 67673 [BackendPassManager]: curr_vmrss: 945mb, ru_maxrss: 945mb (delta=0mb) |
|
2025-08-09T18:45:18Z INFO 67673 [BackendPassManager]: Output has 1 module(s), 1 function(s), 1776 memory location(s), 1 block(s), and 869 instruction(s). Max writers: 1 Max Readers: 325 |
|
2025-08-09T18:45:18Z USER 67673 [BackendPassManager]: Running mod_parallel_pass |
|
2025-08-09T18:45:18Z INFO 67673 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=1776 blocks=1 instructions=869 Max writers: 1 Max Readers: 325 |
|
2025-08-09T18:45:18Z USER 67673 [ModuleForkPass]: Running expand_replication |
|
2025-08-09T18:45:18Z INFO 67673 [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=1776 blocks=1 instructions=869 Max writers: 1 Max Readers: 325 |
|
2025-08-09T18:45:18Z INFO 67673 [ExpandReplication]: Found 0 replicated matmults |
|
2025-08-09T18:45:18Z USER 67673 [ModuleForkPass]: expand_replication finished after 0.001 seconds |
|
2025-08-09T18:45:18Z INFO 67673 [ModuleForkPass]: curr_vmrss: 945mb, ru_maxrss: 945mb (delta=0mb) |
|
2025-08-09T18:45:18Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 1776 memory location(s), 1 block(s), and 869 instruction(s). Max writers: 1 Max Readers: 325 |
|
2025-08-09T18:45:18Z USER 67673 [ModuleForkPass]: Running unroll |
|
2025-08-09T18:45:18Z INFO 67673 [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=1776 blocks=1 instructions=869 Max writers: 1 Max Readers: 325 |
|
2025-08-09T18:45:18Z INFO 67673 [Unroll]: INFO (Unroll) Start unrolling at Sat Aug 9 18:45:18 2025 |
|
2025-08-09T18:45:21Z INFO 67673 [Unroll]: INFO (Unroll) DONE unrolling Sat Aug 9 18:45:18 2025 |
|
|
|
2025-08-09T18:45:21Z INFO 67673 [Unroll]: sg0000 Instruction count after Unroll: |
|
2025-08-09T18:45:21Z INFO 67673 [Unroll]: Total count: 279653 |
|
2025-08-09T18:45:21Z INFO 67673 [Unroll]: Matmult: 212041 |
|
2025-08-09T18:45:21Z INFO 67673 [Unroll]: GenericCopy: 53065 |
|
2025-08-09T18:45:21Z INFO 67673 [Unroll]: Load: 7274 |
|
2025-08-09T18:45:21Z INFO 67673 [Unroll]: Save: 7273 |
|
2025-08-09T18:45:21Z INFO 67673 [Unroll]: Unrolled DGE count with Dynamic AP: 0 |
|
2025-08-09T18:45:21Z USER 67673 [ModuleForkPass]: unroll finished after 2.731 seconds |
|
2025-08-09T18:45:21Z INFO 67673 [ModuleForkPass]: curr_vmrss: 2494mb, ru_maxrss: 2494mb (delta=1549mb) |
|
2025-08-09T18:45:21Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 69168 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:21Z USER 67673 [BackendPassManager]: mod_parallel_pass finished after 2.780 seconds |
|
2025-08-09T18:45:21Z INFO 67673 [BackendPassManager]: curr_vmrss: 1647mb, ru_maxrss: 2494mb (delta=1549mb) |
|
2025-08-09T18:45:21Z INFO 67673 [BackendPassManager]: Output has 1 module(s), 1 function(s), 69168 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:21Z USER 67673 [BackendPassManager]: Running subgraph_parallel_pass |
|
2025-08-09T18:45:21Z INFO 67673 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=69168 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:21Z USER 67673 [SubgraphForkPass]: Running dead_code_elim |
|
2025-08-09T18:45:21Z INFO 67673 [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=69168 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:21Z INFO 67673 [DeadCodeElim]: eliminateDeadStore removed 0 instructions |
|
2025-08-09T18:45:21Z INFO 67673 [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys |
|
2025-08-09T18:45:21Z INFO 67673 [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys |
|
2025-08-09T18:45:21Z INFO 67673 [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys |
|
2025-08-09T18:45:21Z USER 67673 [SubgraphForkPass]: dead_code_elim finished after 0.371 seconds |
|
2025-08-09T18:45:21Z INFO 67673 [SubgraphForkPass]: curr_vmrss: 1669mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:21Z INFO 67673 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:21Z USER 67673 [BackendPassManager]: subgraph_parallel_pass finished after 0.386 seconds |
|
2025-08-09T18:45:21Z INFO 67673 [BackendPassManager]: curr_vmrss: 1669mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:21Z INFO 67673 [BackendPassManager]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:21Z USER 67673 [BackendPassManager]: Running mod_parallel_pass |
|
2025-08-09T18:45:21Z INFO 67673 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:21Z USER 67673 [ModuleForkPass]: Running birverifier |
|
2025-08-09T18:45:21Z INFO 67673 [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:21Z USER 67673 [ModuleForkPass]: birverifier finished after 0.311 seconds |
|
2025-08-09T18:45:21Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1671mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:21Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:21Z USER 67673 [BackendPassManager]: mod_parallel_pass finished after 0.330 seconds |
|
2025-08-09T18:45:21Z INFO 67673 [BackendPassManager]: curr_vmrss: 1671mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:21Z INFO 67673 [BackendPassManager]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:21Z USER 67673 [BackendPassManager]: Running subgraph_parallel_pass |
|
2025-08-09T18:45:21Z INFO 67673 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:21Z USER 67673 [SubgraphForkPass]: Running lnc_verifier |
|
2025-08-09T18:45:21Z INFO 67673 [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:21Z USER 67673 [SubgraphForkPass]: lnc_verifier finished after 0.009 seconds |
|
2025-08-09T18:45:21Z INFO 67673 [SubgraphForkPass]: curr_vmrss: 1671mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:21Z INFO 67673 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:21Z USER 67673 [BackendPassManager]: subgraph_parallel_pass finished after 0.027 seconds |
|
2025-08-09T18:45:21Z INFO 67673 [BackendPassManager]: curr_vmrss: 1671mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:21Z INFO 67673 [BackendPassManager]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:21Z USER 67673 [BackendPassManager]: Running mod_parallel_pass |
|
2025-08-09T18:45:21Z INFO 67673 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:21Z USER 67673 [ModuleForkPass]: Running instruction_reorder |
|
2025-08-09T18:45:21Z INFO 67673 [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:22Z USER 67673 [ModuleForkPass]: instruction_reorder finished after 0.077 seconds |
|
2025-08-09T18:45:22Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1671mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:22Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:22Z USER 67673 [ModuleForkPass]: Running psum_legalization |
|
2025-08-09T18:45:22Z INFO 67673 [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:22Z USER 67673 [ModuleForkPass]: psum_legalization finished after 0.049 seconds |
|
2025-08-09T18:45:22Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1671mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:22Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:22Z USER 67673 [ModuleForkPass]: Running legalize_cce_dma |
|
2025-08-09T18:45:22Z INFO 67673 [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:22Z USER 67673 [ModuleForkPass]: legalize_cce_dma finished after 0.049 seconds |
|
2025-08-09T18:45:22Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1671mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:22Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:22Z USER 67673 [ModuleForkPass]: Running error_injector |
|
2025-08-09T18:45:22Z INFO 67673 [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:22Z WARNING 67673 [ErrorInjector]: Unrecognized injected error value "0" |
|
2025-08-09T18:45:22Z USER 67673 [ModuleForkPass]: error_injector finished after 0.009 seconds |
|
2025-08-09T18:45:22Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1671mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:22Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:22Z USER 67673 [ModuleForkPass]: Running vn_splitter |
|
2025-08-09T18:45:22Z INFO 67673 [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:22Z INFO 67673 [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 0 |
|
2025-08-09T18:45:22Z INFO 67673 [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 |
|
2025-08-09T18:45:22Z INFO 67673 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 |
|
2025-08-09T18:45:22Z INFO 67673 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 |
|
2025-08-09T18:45:22Z INFO 67673 [VNSplitterPass]: INFO (VNSplitter) Time: 0.009 seconds |
|
2025-08-09T18:45:22Z INFO 67673 [VNSplitterPass]: INFO (VerticalFusion) Time: 0.099 seconds |
|
2025-08-09T18:45:22Z INFO 67673 [VNSplitterPass]: INFO (ShrinkDN) Time: 0.115 seconds |
|
2025-08-09T18:45:22Z USER 67673 [ModuleForkPass]: vn_splitter finished after 0.314 seconds |
|
2025-08-09T18:45:22Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1681mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:22Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:22Z USER 67673 [ModuleForkPass]: Running constant_propagate |
|
2025-08-09T18:45:22Z INFO 67673 [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:22Z INFO 67673 [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 |
|
2025-08-09T18:45:22Z INFO 67673 [ConstantPropagate]: eliminateDeadStore removed 0 instructions |
|
2025-08-09T18:45:22Z INFO 67673 [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys |
|
2025-08-09T18:45:23Z INFO 67673 [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys |
|
2025-08-09T18:45:23Z INFO 67673 [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys |
|
2025-08-09T18:45:23Z INFO 67673 [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 |
|
2025-08-09T18:45:24Z INFO 67673 [ConstantPropagate]: eliminateDeadStore removed 0 instructions |
|
2025-08-09T18:45:24Z INFO 67673 [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys |
|
2025-08-09T18:45:24Z INFO 67673 [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys |
|
2025-08-09T18:45:24Z INFO 67673 [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys |
|
2025-08-09T18:45:24Z USER 67673 [ModuleForkPass]: constant_propagate finished after 2.035 seconds |
|
2025-08-09T18:45:24Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1684mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:24Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:24Z USER 67673 [ModuleForkPass]: Running lower_ac |
|
2025-08-09T18:45:24Z INFO 67673 [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:24Z INFO 67673 [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. |
|
2025-08-09T18:45:24Z USER 67673 [ModuleForkPass]: lower_ac finished after 0.049 seconds |
|
2025-08-09T18:45:24Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1684mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:24Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:24Z USER 67673 [ModuleForkPass]: Running input_dma_coalescing |
|
2025-08-09T18:45:24Z INFO 67673 [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:24Z INFO 67673 [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads |
|
2025-08-09T18:45:24Z USER 67673 [ModuleForkPass]: input_dma_coalescing finished after 0.121 seconds |
|
2025-08-09T18:45:24Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1684mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:24Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:24Z USER 67673 [ModuleForkPass]: Running remat_optimization |
|
2025-08-09T18:45:24Z INFO 67673 [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:24Z INFO 67673 [RematOpt]: Removed 0 remat instructions |
|
2025-08-09T18:45:24Z USER 67673 [ModuleForkPass]: remat_optimization finished after 0.200 seconds |
|
2025-08-09T18:45:24Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1686mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:24Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:24Z USER 67673 [ModuleForkPass]: Running early_peephole_opts |
|
2025-08-09T18:45:24Z INFO 67673 [ModuleForkPass]: Inputs to early_peephole_opts: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:24Z INFO 67673 [EarlyPeepholeOpts]: PeepholeOpts enabled? ActivationAccumulate: true |
|
2025-08-09T18:45:24Z INFO 67673 [EarlyPeepholeOpts]: Activation Accumulate: 0 |
|
2025-08-09T18:45:25Z USER 67673 [ModuleForkPass]: early_peephole_opts finished after 0.096 seconds |
|
2025-08-09T18:45:25Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1686mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:25Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:25Z USER 67673 [ModuleForkPass]: Running coalesce_multichannel_cc_ops |
|
2025-08-09T18:45:25Z INFO 67673 [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:25Z USER 67673 [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.027 seconds |
|
2025-08-09T18:45:25Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1686mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:25Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:25Z USER 67673 [ModuleForkPass]: Running infer_stream_ids |
|
2025-08-09T18:45:25Z INFO 67673 [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:25Z USER 67673 [ModuleForkPass]: infer_stream_ids finished after 0.027 seconds |
|
2025-08-09T18:45:25Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1686mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:25Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:25Z USER 67673 [ModuleForkPass]: Running pre_sched |
|
2025-08-09T18:45:25Z INFO 67673 [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:25Z INFO 67673 [PreSched]: Start PRE scheduling 2 cores: 1 at: Sat Aug 9 18:45:25 2025 |
|
2025-08-09T18:45:25Z INFO 67673 [LayerSpiller]: LayerSpill: Start... |
|
2025-08-09T18:45:25Z INFO 67673 [LayerSpiller]: LayerSpill: Found 0 Splits CCs |
|
2025-08-09T18:45:25Z INFO 67673 [LayerSpiller]: Grouped CCs to 0 clusters. |
|
2025-08-09T18:45:25Z INFO 67673 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors |
|
2025-08-09T18:45:25Z INFO 67673 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts |
|
2025-08-09T18:45:25Z INFO 67673 [LayerSpiller]: LayerSpill: Done. |
|
2025-08-09T18:45:25Z INFO 67673 [PreSched]: Start split live ranges Sat Aug 9 18:45:25 2025 |
|
2025-08-09T18:45:25Z INFO 67673 [PreSched]: Num_Splits: 0 |
|
2025-08-09T18:45:25Z INFO 67673 [PreSched]: End split live ranges Sat Aug 9 18:45:25 2025 |
|
2025-08-09T18:45:25Z INFO 67673 [PreSched]: Strt remove redundncies Sat Aug 9 18:45:25 2025 |
|
2025-08-09T18:45:25Z INFO 67673 [PreSched]: remove_redundant_memsets |
|
2025-08-09T18:45:25Z INFO 67673 [PreSched]: remove_redundant_memsets: 0 |
|
2025-08-09T18:45:25Z INFO 67673 [PreSched]: remove_redundant_loads |
|
2025-08-09T18:45:25Z INFO 67673 [PreSched]: remove_redundant_loads: 0 |
|
2025-08-09T18:45:25Z INFO 67673 [PreSched]: End remove redundncies Sat Aug 9 18:45:25 2025 |
|
2025-08-09T18:45:25Z INFO 67673 [PreSched]: Start DCE Sat Aug 9 18:45:25 2025 |
|
2025-08-09T18:45:25Z INFO 67673 [PreSched]: eliminateDeadStore removed 0 instructions |
|
2025-08-09T18:45:25Z INFO 67673 [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys |
|
2025-08-09T18:45:25Z INFO 67673 [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys |
|
2025-08-09T18:45:25Z INFO 67673 [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys |
|
2025-08-09T18:45:25Z INFO 67673 [PreSched]: End DCE Sat Aug 9 18:45:25 2025 |
|
2025-08-09T18:45:25Z INFO 67673 [PreSched]: Start build flow dependencies Sat Aug 9 18:45:25 2025 |
|
2025-08-09T18:45:25Z INFO 67673 [build_flow_deps]: Start build fdeps. Invocation: 1Sat Aug 9 18:45:25 2025 |
|
2025-08-09T18:45:25Z INFO 67673 [build_flow_deps]: Allocs: 68412 instructions: 279653 |
|
2025-08-09T18:45:27Z INFO 67673 [build_flow_deps]: Build fdeps inserted 698765 edges |
|
2025-08-09T18:45:27Z INFO 67673 [build_flow_deps]: Done build fdeps 698765 Sat Aug 9 18:45:27 2025 |
|
2025-08-09T18:45:27Z INFO 67673 [PreSched]: End build flow dependencies Sat Aug 9 18:45:27 2025 |
|
2025-08-09T18:45:27Z INFO 67673 [PreSched]: Start remove useless insts Sat Aug 9 18:45:27 2025 |
|
2025-08-09T18:45:27Z INFO 67673 [PreSched]: remove_useless_insts |
|
2025-08-09T18:45:27Z INFO 67673 [PreSched]: remove Useless Instructions: 0 |
|
2025-08-09T18:45:27Z INFO 67673 [PreSched]: End remove useless insts Sat Aug 9 18:45:27 2025 |
|
2025-08-09T18:45:27Z INFO 67673 [PreSched]: Start scratchpad optimization Sat Aug 9 18:45:27 2025 |
|
2025-08-09T18:45:27Z INFO 67673 [PreSched]: End scratchpad optimization Sat Aug 9 18:45:27 2025 |
|
2025-08-09T18:45:27Z INFO 67673 [PreSched]: DONE PRE scheduling Sat Aug 9 18:45:27 2025 |
|
2025-08-09T18:45:27Z USER 67673 [ModuleForkPass]: pre_sched finished after 2.387 seconds |
|
2025-08-09T18:45:27Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1810mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:27Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:27Z USER 67673 [ModuleForkPass]: Running tensor_copy_elim |
|
2025-08-09T18:45:27Z INFO 67673 [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:27Z INFO 67673 [TensorCopyElim]: Tensor CP elimination: 0 |
|
2025-08-09T18:45:27Z INFO 67673 [TensorCopyElim]: eliminateDeadStore removed 0 instructions |
|
2025-08-09T18:45:27Z INFO 67673 [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys |
|
2025-08-09T18:45:27Z INFO 67673 [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys |
|
2025-08-09T18:45:27Z INFO 67673 [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys |
|
2025-08-09T18:45:27Z USER 67673 [ModuleForkPass]: tensor_copy_elim finished after 0.474 seconds |
|
2025-08-09T18:45:27Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1812mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:27Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:27Z USER 67673 [ModuleForkPass]: Running dynamic_dma_setup |
|
2025-08-09T18:45:27Z INFO 67673 [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:27Z USER 67673 [ModuleForkPass]: dynamic_dma_setup finished after 0.007 seconds |
|
2025-08-09T18:45:27Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1812mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:28Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68413 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:28Z USER 67673 [ModuleForkPass]: Running runtime_memory_reservation |
|
2025-08-09T18:45:28Z INFO 67673 [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=68413 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:28Z USER 67673 [ModuleForkPass]: runtime_memory_reservation finished after 0.006 seconds |
|
2025-08-09T18:45:28Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1812mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:28Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68413 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:28Z USER 67673 [ModuleForkPass]: Running coloring_allocator_psum |
|
2025-08-09T18:45:28Z INFO 67673 [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=68413 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:28Z INFO 67673 [ColoringAllocator::Rep]: Allocating functions |
|
2025-08-09T18:45:28Z INFO 67673 [ColoringAllocator::Rep]: linearize and check |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: allocating PSUM |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: main loop |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: renumber locations |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: size = 53065 |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: build_no_bitmap start |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: 100% PSUM demand before spilling |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: PSUM high-water mark = 8 tensors |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: found 171648 edges |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: mean: 6.46935 |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: median: 6.99995 |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: adjacency vectors require 1373184 bytes |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: build_no_bitmap done |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: find costs |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: simplify interference graph |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: initialize low and high |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: lo = 53065 |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: hi = 0 |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: inf = 0 |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: total = 53065 |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: simplify |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: new candidates = 0 |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: select ranges |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: no more spills |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: PSUM score = 0 (lower is better) |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: spilling from PSUM cost about 0 cycles |
|
2025-08-09T18:45:28Z INFO 67673 [PSUM_Allocator]: 100% PSUM utilization after allocation |
|
2025-08-09T18:45:28Z USER 67673 [ModuleForkPass]: coloring_allocator_psum finished after 0.663 seconds |
|
2025-08-09T18:45:28Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1828mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:28Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68413 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:28Z USER 67673 [ModuleForkPass]: Running dma_optimization_psum |
|
2025-08-09T18:45:28Z INFO 67673 [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=68413 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:28Z INFO 67673 [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions |
|
2025-08-09T18:45:28Z INFO 67673 [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations |
|
2025-08-09T18:45:28Z USER 67673 [ModuleForkPass]: dma_optimization_psum finished after 0.259 seconds |
|
2025-08-09T18:45:28Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1828mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:28Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68413 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:28Z USER 67673 [ModuleForkPass]: Running address_rotation_psum |
|
2025-08-09T18:45:28Z INFO 67673 [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=68413 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:29Z INFO 67673 [DMAOptimizationBase]: PSUM Rotation rotated 0 PSUM Banks |
|
2025-08-09T18:45:30Z INFO 67673 [DMAOptimizationBase]: PSUM Rotation rotated 0 PSUM Banks |
|
2025-08-09T18:45:31Z INFO 67673 [DMAOptimizationBase]: PSUM Rotation rotated 0 PSUM Banks |
|
2025-08-09T18:45:31Z USER 67673 [ModuleForkPass]: address_rotation_psum finished after 2.215 seconds |
|
2025-08-09T18:45:31Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1830mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:31Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68413 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:31Z USER 67673 [ModuleForkPass]: Running coloring_allocator_sb |
|
2025-08-09T18:45:31Z INFO 67673 [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=68413 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:31Z INFO 67673 [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 6946398208 |
|
2025-08-09T18:45:31Z INFO 67673 [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 7517 bytes |
|
2025-08-09T18:45:31Z INFO 67673 [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 6946365440 |
|
2025-08-09T18:45:31Z INFO 67673 [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 7461 bytes |
|
2025-08-09T18:45:31Z INFO 67673 [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 0 |
|
2025-08-09T18:45:31Z INFO 67673 [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 0 bytes |
|
2025-08-09T18:45:31Z INFO 67673 [ColoringAllocator::Rep]: Allocating functions |
|
2025-08-09T18:45:31Z INFO 67673 [ColoringAllocator::Rep]: linearize and check |
|
2025-08-09T18:45:31Z INFO 67673 [SB_Allocator]: allocating SB |
|
2025-08-09T18:45:31Z INFO 67673 [SB_Allocator]: main loop |
|
2025-08-09T18:45:31Z INFO 67673 [SB_Allocator]: renumber locations |
|
2025-08-09T18:45:31Z INFO 67673 [SB_Allocator]: size = 14548 |
|
2025-08-09T18:45:31Z INFO 67673 [SB_Allocator]: find partners |
|
2025-08-09T18:45:31Z INFO 67673 [SB_Allocator]: found 53065 accumulation groups |
|
2025-08-09T18:45:31Z INFO 67673 [SB_Allocator]: largest = 22342.27111_i383 |
|
2025-08-09T18:45:31Z INFO 67673 [SB_Allocator]: tensors = 2 |
|
2025-08-09T18:45:31Z INFO 67673 [SB_Allocator]: requires 8448 bytes/partition |
|
2025-08-09T18:45:31Z INFO 67673 [SB_Allocator]: expanding partners |
|
2025-08-09T18:45:31Z INFO 67673 []: find first defs for local |
|
2025-08-09T18:45:31Z INFO 67673 []: find first defs for global |
|
2025-08-09T18:45:31Z INFO 67673 [SB_Allocator]: find loads |
|
2025-08-09T18:45:31Z INFO 67673 [SB_Allocator]: 1 pin count |
|
2025-08-09T18:45:31Z INFO 67673 [SB_Allocator]: 6121 remat count |
|
2025-08-09T18:45:31Z INFO 67673 [SB_Allocator]: 1 pinned tensors will require about 16384 bytes/partition |
|
2025-08-09T18:45:31Z INFO 67673 [SB_Allocator]: build interference graph |
|
2025-08-09T18:45:31Z INFO 67673 [SB_Allocator]: pass 1 int-tree |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: Num intervals 14548 Num locations 14548 |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: IntervalTree Build Done |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: info.neighbors init Done |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: info.neighbors partners Done |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: IntervalTree readback Done |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: edge: 32260 |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: mean: 4.43497 |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: median: 2.00048 |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: find costs |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: best-of-n loop, heuristic = 0 |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: simplify interference graph |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: initialize safe & unsafe |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: safe = 14546 |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: unsafe = 1 |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: inf = 0 |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: total = 14547 |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: simplify |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: simplify_step3_sorted2 #Unsafe 0 #Pinned 0 #Safe 0 minCost 1.79769e+308 maxCost 2.22507e-308 locations 14548 |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: new candidates = 0 |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: select ranges |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: Total: 14547 |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: Spilled: 0.000 (0) |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: Allocated: 1.000 (14547) |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: Rover zone: 0.988 (14367) |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: Pre-rover zone: 0.010 (144) |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: Post-rover zone: 0.002 (36) |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: Slice zone: 0.000 (0) |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: Blocks nothing: 0.000 (0) |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: Blocks medium: 0.000 (0) |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: Blocks tall: 1.000 (14547) |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: Visited until tall blocking (mean): 0.996 |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: Visited until tall blocking (median): 1.000 |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: Visited until tall blocking (p95): 1.000 |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: Success |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: SB spills = 0 tensors |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: size = 0 bytes/partition |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: remats = 0 tensors |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: unpinned = 0 tensors |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: size = 0 bytes/partition |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: SB score = 0 |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: spilling from SB cost about 0 cycles |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: 16384 bytes/partition (100%) successfully pinned |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: pinning saved approximately 9010 cycles |
|
2025-08-09T18:45:32Z INFO 67673 [SB_Allocator]: 0% SB utilization after allocation |
|
2025-08-09T18:45:32Z INFO 67673 [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 6946398208 |
|
2025-08-09T18:45:32Z INFO 67673 [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 7517 bytes |
|
2025-08-09T18:45:32Z INFO 67673 [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 6946365440 |
|
2025-08-09T18:45:32Z INFO 67673 [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 7461 bytes |
|
2025-08-09T18:45:32Z INFO 67673 [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 0 |
|
2025-08-09T18:45:32Z INFO 67673 [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 0 bytes |
|
2025-08-09T18:45:32Z USER 67673 [ModuleForkPass]: coloring_allocator_sb finished after 1.186 seconds |
|
2025-08-09T18:45:32Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1835mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:32Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68413 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:32Z USER 67673 [ModuleForkPass]: Running address_rotation_sb |
|
2025-08-09T18:45:32Z INFO 67673 [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=68413 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:32Z INFO 67673 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address |
|
2025-08-09T18:45:32Z USER 67673 [ModuleForkPass]: address_rotation_sb finished after 0.356 seconds |
|
2025-08-09T18:45:32Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1838mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:32Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68413 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:32Z USER 67673 [ModuleForkPass]: Running dma_optimization_sb |
|
2025-08-09T18:45:32Z INFO 67673 [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=68413 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:32Z INFO 67673 [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 13892763648, 50.0001% input load, 49.9999% output write, 0% spill/reload [sg0000] |
|
2025-08-09T18:45:32Z INFO 67673 [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs |
|
2025-08-09T18:45:33Z INFO 67673 [DMAOptimizationBase]: removed 0 identical load |
|
2025-08-09T18:45:33Z INFO 67673 [DMAOptimizationBase]: adjusted 0 DMACopy remat |
|
2025-08-09T18:45:33Z INFO 67673 [DMAOptimizationBase]: adjusted 0 DMACopy remat |
|
2025-08-09T18:45:33Z INFO 67673 [DMAOptimizationBase]: sub-graph will get execute 1 times |
|
2025-08-09T18:45:33Z INFO 67673 [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions |
|
2025-08-09T18:45:33Z INFO 67673 [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions |
|
2025-08-09T18:45:33Z INFO 67673 [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 0, 0% out of total dma traffic(6.9464e+09) |
|
2025-08-09T18:45:33Z INFO 67673 [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions |
|
2025-08-09T18:45:33Z INFO 67673 [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations |
|
2025-08-09T18:45:33Z INFO 67673 [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, -nan% out of total spill/reload dma traffic |
|
2025-08-09T18:45:33Z INFO 67673 [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions |
|
2025-08-09T18:45:33Z INFO 67673 [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations |
|
2025-08-09T18:45:33Z INFO 67673 [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, -nan% out of total spill/reload dma traffic |
|
2025-08-09T18:45:33Z INFO 67673 [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions |
|
2025-08-09T18:45:33Z INFO 67673 [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations |
|
2025-08-09T18:45:33Z INFO 67673 [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, -nan% out of total spill/reload dma traffic |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload instructions |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload memory locations |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 0 SpillSaves and Reloads |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: average loaded DMA size 7517 bytes |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: average saved DMA size 7461 bytes |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 6946398208 |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 7517 bytes |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 6946365440 |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 7461 bytes |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 0, -nan% out of total spill/reload dma traffic |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 0, 0% out of total dma traffic |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 13892763648, 50.0001% input load, 49.9999% output write, 0% spill/reload [sg0000] |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 6946398208 |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 7517 bytes |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 6946365440 |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 7461 bytes |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 0 |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 0 bytes |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 7488 bytes |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); |
|
2025-08-09T18:45:34Z INFO 67673 [DMAOptimizationBase]: DMA optimization re-enable optimization |
|
2025-08-09T18:45:34Z USER 67673 [ModuleForkPass]: dma_optimization_sb finished after 2.175 seconds |
|
2025-08-09T18:45:34Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1857mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:34Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:34Z USER 67673 [ModuleForkPass]: Running address_rotation_sb |
|
2025-08-09T18:45:34Z INFO 67673 [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:35Z INFO 67673 [DMAOptimizationBase]: SB Rotation rotated 5962 Sb address |
|
2025-08-09T18:45:35Z INFO 67673 [DMAOptimizationBase]: SB Rotation rotated 4811 Sb address |
|
2025-08-09T18:45:35Z INFO 67673 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address |
|
2025-08-09T18:45:36Z INFO 67673 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address |
|
2025-08-09T18:45:36Z INFO 67673 [DMAOptimizationBase]: SB Rotation rotated 2052 Sb address |
|
2025-08-09T18:45:36Z INFO 67673 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address |
|
2025-08-09T18:45:36Z USER 67673 [ModuleForkPass]: address_rotation_sb finished after 2.022 seconds |
|
2025-08-09T18:45:36Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1857mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:36Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:36Z USER 67673 [ModuleForkPass]: Running coloring_allocator_dram |
|
2025-08-09T18:45:36Z INFO 67673 [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:36Z INFO 67673 [ColoringAllocator::Rep]: Allocating functions |
|
2025-08-09T18:45:36Z INFO 67673 [ColoringAllocator::Rep]: linearize and check |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: reserved space = 16382119936 bytes |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: spill space = 0 bytes |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: aligned spill space = 0 bytes |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: dram space = 107374182400 bytes |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: renumber locations |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: size = 0 |
|
2025-08-09T18:45:37Z INFO 67673 []: find first defs for local |
|
2025-08-09T18:45:37Z INFO 67673 []: find first defs for global |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: Num intervals 0 Num locations 0 |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: IntervalTree Build Done |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: info.neighbors init Done |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: IntervalTree readback Done |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: simplify interference graph |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: initialize low and high |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: lo = 0 |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: hi = 0 |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: total = 0 |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: simplify |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: new candidates = 0 |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: select ranges |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: CC buffer size limit 524288000 |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: allreduce_dram_hwm 0 |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: Real CC buffer size 0 |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: DRAM hwm after allocation: 0 |
|
2025-08-09T18:45:37Z INFO 67673 [DRAM_Allocator]: DRAM allocation successful |
|
2025-08-09T18:45:37Z USER 67673 [ModuleForkPass]: coloring_allocator_dram finished after 0.466 seconds |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1862mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:37Z USER 67673 [ModuleForkPass]: Running address_rotation_dram |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:37Z INFO 67673 [DMAOptimizationBase]: Runtime page size at 512MB |
|
2025-08-09T18:45:37Z INFO 67673 [DMAOptimizationBase]: DRAM hwm before rotation 0 |
|
2025-08-09T18:45:37Z INFO 67673 [DMAOptimizationBase]: allreduce buffer size 524288000 |
|
2025-08-09T18:45:37Z INFO 67673 [DMAOptimizationBase]: allreduce hwm 0 |
|
2025-08-09T18:45:37Z INFO 67673 [DMAOptimizationBase]: Real CC buffer size 0 |
|
2025-08-09T18:45:37Z INFO 67673 [DMAOptimizationBase]: DRAM hwm after rotation 0 |
|
2025-08-09T18:45:37Z INFO 67673 [DMAOptimizationBase]: DRAM Rotation rotated 0 Dram address |
|
2025-08-09T18:45:37Z USER 67673 [ModuleForkPass]: address_rotation_dram finished after 0.254 seconds |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1862mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:37Z USER 67673 [ModuleForkPass]: Running tensorcopy_accel |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:37Z INFO 67673 [TensorCopyAccel::Impl]: Running peephole optimization pass |
|
2025-08-09T18:45:37Z INFO 67673 [TensorCopyAccel::Impl]: Accelerated 0 out of 53065 tensorcopy in Function: sg0000 average acceleration factor: -nan |
|
2025-08-09T18:45:37Z USER 67673 [ModuleForkPass]: tensorcopy_accel finished after 0.037 seconds |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1862mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:37Z USER 67673 [ModuleForkPass]: Running peephole_opts |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:37Z INFO 67673 [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true |
|
2025-08-09T18:45:37Z USER 67673 [ModuleForkPass]: peephole_opts finished after 0.109 seconds |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1862mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:37Z USER 67673 [ModuleForkPass]: Running lower_kernel |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:37Z INFO 67673 [LowerKernel]: Started running LowerKernel |
|
2025-08-09T18:45:37Z INFO 67673 [LowerKernel]: Start of kernel lowering pass, number of insts: 279653, number of allocs: 68412 |
|
2025-08-09T18:45:37Z INFO 67673 [LowerKernel]: Scan BKs time (s): 0.022361 |
|
2025-08-09T18:45:37Z INFO 67673 [LowerKernel]: Lower BKs time (s): 1.3e-05 |
|
2025-08-09T18:45:37Z USER 67673 [ModuleForkPass]: lower_kernel finished after 0.031 seconds |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1862mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:37Z USER 67673 [ModuleForkPass]: Running lower_nki_kernel |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:37Z USER 67673 [ModuleForkPass]: lower_nki_kernel finished after 0.028 seconds |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1862mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:37Z USER 67673 [ModuleForkPass]: Running dynamic_dma_cleanup |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:37Z USER 67673 [ModuleForkPass]: dynamic_dma_cleanup finished after 0.044 seconds |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1864mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:37Z USER 67673 [ModuleForkPass]: Running birverifier |
|
2025-08-09T18:45:37Z INFO 67673 [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:38Z USER 67673 [ModuleForkPass]: birverifier finished after 0.322 seconds |
|
2025-08-09T18:45:38Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1864mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:38Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:38Z USER 67673 [ModuleForkPass]: Running dynamic_dma_scan |
|
2025-08-09T18:45:38Z INFO 67673 [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:38Z USER 67673 [ModuleForkPass]: dynamic_dma_scan finished after 0.043 seconds |
|
2025-08-09T18:45:38Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1864mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:38Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:38Z USER 67673 [ModuleForkPass]: Running build_fdeps |
|
2025-08-09T18:45:38Z INFO 67673 [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:38Z INFO 67673 [build_flow_deps]: Start build fdeps. Invocation: 2Sat Aug 9 18:45:38 2025 |
|
2025-08-09T18:45:38Z INFO 67673 [build_flow_deps]: Allocs: 68412 instructions: 279653 |
|
2025-08-09T18:45:39Z INFO 67673 [build_flow_deps]: Build fdeps inserted 698765 edges |
|
2025-08-09T18:45:39Z INFO 67673 [build_flow_deps]: Done build fdeps 698765 Sat Aug 9 18:45:39 2025 |
|
2025-08-09T18:45:39Z USER 67673 [ModuleForkPass]: build_fdeps finished after 1.197 seconds |
|
2025-08-09T18:45:39Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1896mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:39Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:39Z USER 67673 [ModuleForkPass]: Running remove_redundancies |
|
2025-08-09T18:45:39Z INFO 67673 [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:39Z INFO 67673 [RemoveRedundancies]: remove_clobbered_writes |
|
2025-08-09T18:45:39Z INFO 67673 [RemoveRedundancies]: remove_clobbered_writes: 0 |
|
2025-08-09T18:45:39Z INFO 67673 [RemoveRedundancies]: remove_useless_insts |
|
2025-08-09T18:45:39Z INFO 67673 [RemoveRedundancies]: remove Useless Instructions: 0 |
|
2025-08-09T18:45:39Z USER 67673 [ModuleForkPass]: remove_redundancies finished after 0.164 seconds |
|
2025-08-09T18:45:39Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1896mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:39Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:39Z USER 67673 [ModuleForkPass]: Running anti_dependency_analyzer |
|
2025-08-09T18:45:39Z INFO 67673 [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:39Z INFO 67673 [AntiDependencyAnalyzer]: Batch size: 1000 |
|
2025-08-09T18:45:39Z INFO 67673 [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} |
|
2025-08-09T18:45:39Z INFO 67673 [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 |
|
2025-08-09T18:45:40Z USER 67673 [ModuleForkPass]: anti_dependency_analyzer finished after 1.041 seconds |
|
2025-08-09T18:45:40Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1985mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:40Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:40Z USER 67673 [ModuleForkPass]: Running tensor_copy_elim |
|
2025-08-09T18:45:40Z INFO 67673 [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:40Z INFO 67673 [TensorCopyElim]: Tensor CP elimination: 0 |
|
2025-08-09T18:45:41Z INFO 67673 [TensorCopyElim]: eliminateDeadStore removed 0 instructions |
|
2025-08-09T18:45:41Z USER 67673 [ModuleForkPass]: tensor_copy_elim finished after 0.377 seconds |
|
2025-08-09T18:45:41Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1994mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:41Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:41Z USER 67673 [ModuleForkPass]: Running prefetch_scheduling_before_sched |
|
2025-08-09T18:45:41Z INFO 67673 [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:41Z USER 67673 [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.007 seconds |
|
2025-08-09T18:45:41Z INFO 67673 [ModuleForkPass]: curr_vmrss: 1994mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:41Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:41Z USER 67673 [ModuleForkPass]: Running post_sched |
|
2025-08-09T18:45:41Z INFO 67673 [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:41Z INFO 67673 [post_scheduler]: Start PosT ScheD 3 sunda Sat Aug 9 18:45:41 2025 |
|
2025-08-09T18:45:44Z INFO 67673 [post_scheduler]: Time-aware hwm post-sched |
|
2025-08-09T18:45:46Z INFO 67673 [post_scheduler]: Time-aware simulation time: 58352865 |
|
2025-08-09T18:45:46Z INFO 67673 [post_scheduler]: Done PosT ScheD Sat Aug 9 18:45:46 2025 |
|
2025-08-09T18:45:46Z USER 67673 [ModuleForkPass]: post_sched finished after 5.460 seconds |
|
2025-08-09T18:45:46Z INFO 67673 [ModuleForkPass]: curr_vmrss: 2386mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:46Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:46Z USER 67673 [ModuleForkPass]: Running expand_scheduling_units |
|
2025-08-09T18:45:46Z INFO 67673 [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:46Z USER 67673 [ModuleForkPass]: expand_scheduling_units finished after 0.038 seconds |
|
2025-08-09T18:45:46Z INFO 67673 [ModuleForkPass]: curr_vmrss: 2142mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:46Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:46Z USER 67673 [ModuleForkPass]: Running address_rotation_sb |
|
2025-08-09T18:45:46Z INFO 67673 [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:48Z INFO 67673 [DMAOptimizationBase]: PSUM Rotation rotated 10969 PSUM Banks |
|
2025-08-09T18:45:49Z INFO 67673 [DMAOptimizationBase]: PSUM Rotation rotated 8848 PSUM Banks |
|
2025-08-09T18:45:50Z INFO 67673 [DMAOptimizationBase]: PSUM Rotation rotated 0 PSUM Banks |
|
2025-08-09T18:45:50Z INFO 67673 [DMAOptimizationBase]: SB Rotation rotated 2531 Sb address |
|
2025-08-09T18:45:51Z INFO 67673 [DMAOptimizationBase]: SB Rotation rotated 2569 Sb address |
|
2025-08-09T18:45:51Z INFO 67673 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address |
|
2025-08-09T18:45:51Z INFO 67673 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address |
|
2025-08-09T18:45:52Z INFO 67673 [DMAOptimizationBase]: SB Rotation rotated 71 Sb address |
|
2025-08-09T18:45:52Z INFO 67673 [DMAOptimizationBase]: moved 0 MM forward |
|
2025-08-09T18:45:52Z INFO 67673 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address |
|
2025-08-09T18:45:53Z INFO 67673 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address |
|
2025-08-09T18:45:53Z USER 67673 [ModuleForkPass]: address_rotation_sb finished after 6.509 seconds |
|
2025-08-09T18:45:53Z INFO 67673 [ModuleForkPass]: curr_vmrss: 2178mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:53Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:53Z USER 67673 [ModuleForkPass]: Running anti_dependency_analyzer |
|
2025-08-09T18:45:53Z INFO 67673 [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:53Z INFO 67673 [AntiDependencyAnalyzer]: Batch size: 1000 |
|
2025-08-09T18:45:53Z INFO 67673 [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} |
|
2025-08-09T18:45:53Z INFO 67673 [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 |
|
2025-08-09T18:45:54Z USER 67673 [ModuleForkPass]: anti_dependency_analyzer finished after 0.807 seconds |
|
2025-08-09T18:45:54Z INFO 67673 [ModuleForkPass]: curr_vmrss: 2209mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:54Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:54Z USER 67673 [ModuleForkPass]: Running anti_dependency_analyzer |
|
2025-08-09T18:45:54Z INFO 67673 [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:54Z INFO 67673 [AntiDependencyAnalyzer]: Batch size: 1000 |
|
2025-08-09T18:45:54Z INFO 67673 [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} |
|
2025-08-09T18:45:54Z INFO 67673 [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 |
|
2025-08-09T18:45:54Z USER 67673 [ModuleForkPass]: anti_dependency_analyzer finished after 0.213 seconds |
|
2025-08-09T18:45:54Z INFO 67673 [ModuleForkPass]: curr_vmrss: 2213mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:54Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:54Z USER 67673 [ModuleForkPass]: Running dep_opt |
|
2025-08-09T18:45:54Z INFO 67673 [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:54Z INFO 67673 [build_flow_deps]: Start build fdeps. Invocation: 3Sat Aug 9 18:45:54 2025 |
|
2025-08-09T18:45:54Z INFO 67673 [build_flow_deps]: Allocs: 68412 instructions: 279653 |
|
2025-08-09T18:45:55Z INFO 67673 [build_flow_deps]: Build fdeps inserted 685617 edges |
|
2025-08-09T18:45:55Z INFO 67673 [build_flow_deps]: Done build fdeps 685617 Sat Aug 9 18:45:55 2025 |
|
2025-08-09T18:45:55Z USER 67673 [ModuleForkPass]: dep_opt finished after 1.580 seconds |
|
2025-08-09T18:45:55Z INFO 67673 [ModuleForkPass]: curr_vmrss: 2213mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:55Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:55Z USER 67673 [ModuleForkPass]: Running report_stats |
|
2025-08-09T18:45:55Z INFO 67673 [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:55Z INFO 67673 [ReportStats]: Data Movement Statistics: sg0000 |
|
βββββββββββββββ¬βββββββββββββββββββββββββββββ¬ββββββββ¬βββββββββββββ |
|
β Instruction β Kind β Count β Bytes β |
|
βββββββββββββββΌβββββββββββββββββββββββββββββΌββββββββΌβββββββββββββ€ |
|
β Load β Const -> Internal β 1 β 32768 β |
|
β Load β ExternalInput -> Internal β 7273 β 6946365440 β |
|
β Save β Internal -> ExternalOutput β 7273 β 6946365440 β |
|
βββββββββββββββ΄βββββββββββββββββββββββββββββ΄ββββββββ΄βββββββββββββ |
|
|
|
2025-08-09T18:45:55Z INFO 67673 [ReportStats]: |
|
βββββββββββββββββββββββ¬ββββββββ |
|
β Bytes per partition β Count β |
|
βββββββββββββββββββββββΌββββββββ€ |
|
β 64 β 73 β |
|
β 256 β 74 β |
|
β 6144 β 4608 β |
|
β 8192 β 9792 β |
|
βββββββββββββββββββββββ΄ββββββββ |
|
|
|
2025-08-09T18:45:55Z INFO 67673 [ReportStats]: MM Stats: #MatMults 212041 #MatMult-Transposes 212041 |
|
2025-08-09T18:45:55Z INFO 67673 [ReportStats]: IO Tensor size combined: 16382087168 |
|
2025-08-09T18:45:55Z INFO 67673 [ReportStats]: IO Tensor Statistics: |
|
ββββββββββββββββββββββ¬βββββββββββββββββ¬βββββββββββ¬βββββββββββββββ |
|
β Largest IO Tensors β Kind β Src Type β Size (Bytes) β |
|
ββββββββββββββββββββββΌβββββββββββββββββΌβββββββββββΌβββββββββββββββ€ |
|
β output0 β ExternalOutput β bfloat16 β 622329856 β |
|
β input0 β ExternalInput β bfloat16 β 622329856 β |
|
β output397 β ExternalOutput β bfloat16 β 622329856 β |
|
β input397 β ExternalInput β bfloat16 β 622329856 β |
|
β input8 β ExternalInput β bfloat16 β 50331648 β |
|
β input22 β ExternalInput β bfloat16 β 50331648 β |
|
β input30 β ExternalInput β bfloat16 β 50331648 β |
|
β input20 β ExternalInput β bfloat16 β 50331648 β |
|
β input11 β ExternalInput β bfloat16 β 50331648 β |
|
β input33 β ExternalInput β bfloat16 β 50331648 β |
|
ββββββββββββββββββββββ΄βββββββββββββββββ΄βββββββββββ΄βββββββββββββββ |
|
|
|
2025-08-09T18:45:55Z INFO 67673 [ReportStats]: Large (Internal) Tensor Statistics: |
|
ββββββββββββββββββββββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββββββ |
|
β Largest Tensors β Kind β Src Type β Size (Bytes) β |
|
ββββββββββββββββββββββββββββββΌβββββββββββΌβββββββββββΌβββββββββββββββ€ |
|
β DynamicDMAScratchLoc β Internal β uint8 β 2097152 β |
|
β t2499_pftranspose_20873_i5 β Internal β bfloat16 β 1048576 β |
|
β t2499_pftranspose_20873_i2 β Internal β bfloat16 β 1048576 β |
|
β t2499_pftranspose_20873_i1 β Internal β bfloat16 β 1048576 β |
|
β t2499_pftranspose_20873_i3 β Internal β bfloat16 β 1048576 β |
|
β t2499_pftranspose_20873_i4 β Internal β bfloat16 β 1048576 β |
|
β t2499_pftranspose_20873_i6 β Internal β bfloat16 β 1048576 β |
|
β t2499_pftranspose_20873_i9 β Internal β bfloat16 β 1048576 β |
|
β t2499_pftranspose_20873_i8 β Internal β bfloat16 β 1048576 β |
|
β t2499_pftranspose_20873_i7 β Internal β bfloat16 β 1048576 β |
|
ββββββββββββββββββββββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββββββ |
|
|
|
2025-08-09T18:45:55Z USER 67673 [ModuleForkPass]: report_stats finished after 0.081 seconds |
|
2025-08-09T18:45:55Z INFO 67673 [ModuleForkPass]: curr_vmrss: 2213mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:55Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:55Z USER 67673 [BackendPassManager]: mod_parallel_pass finished after 33.982 seconds |
|
2025-08-09T18:45:55Z INFO 67673 [BackendPassManager]: curr_vmrss: 2213mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:55Z INFO 67673 [BackendPassManager]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:55Z USER 67673 [BackendPassManager]: Running assign_trigger_engine |
|
2025-08-09T18:45:55Z INFO 67673 [BackendPassManager]: Inputs to assign_trigger_engine: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z INFO 67673 [AssignTriggerEngine]: Assigned trigger engine for 0 DMA instructions. Moved 0 DMA instructions to CC's engines. |
|
2025-08-09T18:45:56Z USER 67673 [BackendPassManager]: assign_trigger_engine finished after 0.121 seconds |
|
2025-08-09T18:45:56Z INFO 67673 [BackendPassManager]: curr_vmrss: 2213mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:56Z INFO 67673 [BackendPassManager]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z USER 67673 [BackendPassManager]: Running subgraph_parallel_pass |
|
2025-08-09T18:45:56Z INFO 67673 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z USER 67673 [SubgraphForkPass]: Running lower_local_collectives |
|
2025-08-09T18:45:56Z INFO 67673 [SubgraphForkPass]: Inputs to lower_local_collectives: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z USER 67673 [SubgraphForkPass]: lower_local_collectives finished after 0.006 seconds |
|
2025-08-09T18:45:56Z INFO 67673 [SubgraphForkPass]: curr_vmrss: 2213mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:56Z INFO 67673 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z USER 67673 [SubgraphForkPass]: Running extend_shared_lifetimes |
|
2025-08-09T18:45:56Z INFO 67673 [SubgraphForkPass]: Inputs to extend_shared_lifetimes: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z USER 67673 [SubgraphForkPass]: extend_shared_lifetimes finished after 0.006 seconds |
|
2025-08-09T18:45:56Z INFO 67673 [SubgraphForkPass]: curr_vmrss: 2213mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:56Z INFO 67673 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z USER 67673 [SubgraphForkPass]: Running dead_code_elim |
|
2025-08-09T18:45:56Z INFO 67673 [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z INFO 67673 [DeadCodeElim]: eliminateDeadStore removed 0 instructions |
|
2025-08-09T18:45:56Z USER 67673 [SubgraphForkPass]: dead_code_elim finished after 0.262 seconds |
|
2025-08-09T18:45:56Z INFO 67673 [SubgraphForkPass]: curr_vmrss: 2213mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:56Z INFO 67673 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z USER 67673 [BackendPassManager]: subgraph_parallel_pass finished after 0.301 seconds |
|
2025-08-09T18:45:56Z INFO 67673 [BackendPassManager]: curr_vmrss: 2213mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:56Z INFO 67673 [BackendPassManager]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z USER 67673 [BackendPassManager]: Running assign_hwdge_engine |
|
2025-08-09T18:45:56Z INFO 67673 [BackendPassManager]: Inputs to assign_hwdge_engine: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z USER 67673 [BackendPassManager]: assign_hwdge_engine finished after 0.040 seconds |
|
2025-08-09T18:45:56Z INFO 67673 [BackendPassManager]: curr_vmrss: 2213mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:56Z INFO 67673 [BackendPassManager]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z USER 67673 [BackendPassManager]: Running mod_parallel_pass |
|
2025-08-09T18:45:56Z INFO 67673 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z USER 67673 [ModuleForkPass]: Running alloc_queues |
|
2025-08-09T18:45:56Z INFO 67673 [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z INFO 67673 [AllocQueues]: DMACopy transpose will be triggered from multiple engines |
|
2025-08-09T18:45:56Z INFO 67673 [AllocQueues]: Alloc Queue info: |
|
βββββββββββββββββββ¬βββββββββββββββββ¬βββββββββ¬βββββββββββββ¬βββββββββββββββββββ |
|
β Name β DMAQueue::Type β Engine β Num Queues β Num instructions β |
|
βββββββββββββββββββΌβββββββββββββββββΌβββββββββΌβββββββββββββΌβββββββββββββββββββ€ |
|
β qSPSpillReload0 β data β SP β 16 β 1 β |
|
β qPoolDynamic β dynamic β Pool β 16 β 14546 β |
|
βββββββββββββββββββ΄βββββββββββββββββ΄βββββββββ΄βββββββββββββ΄βββββββββββββββββββ |
|
|
|
2025-08-09T18:45:56Z USER 67673 [ModuleForkPass]: alloc_queues finished after 0.041 seconds |
|
2025-08-09T18:45:56Z INFO 67673 [ModuleForkPass]: curr_vmrss: 2213mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:56Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z USER 67673 [ModuleForkPass]: Running chain_dma_transposes |
|
2025-08-09T18:45:56Z INFO 67673 [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z USER 67673 [ModuleForkPass]: chain_dma_transposes finished after 0.006 seconds |
|
2025-08-09T18:45:56Z INFO 67673 [ModuleForkPass]: curr_vmrss: 2213mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:56Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z USER 67673 [ModuleForkPass]: Running prefetch_scheduling_after_sched |
|
2025-08-09T18:45:56Z INFO 67673 [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z USER 67673 [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.006 seconds |
|
2025-08-09T18:45:56Z INFO 67673 [ModuleForkPass]: curr_vmrss: 2213mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:56Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z USER 67673 [ModuleForkPass]: Running lower_control |
|
2025-08-09T18:45:56Z INFO 67673 [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z INFO 67673 [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps |
|
2025-08-09T18:45:56Z USER 67673 [ModuleForkPass]: lower_control finished after 0.214 seconds |
|
2025-08-09T18:45:56Z INFO 67673 [ModuleForkPass]: curr_vmrss: 2213mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:56Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z USER 67673 [BackendPassManager]: mod_parallel_pass finished after 0.300 seconds |
|
2025-08-09T18:45:56Z INFO 67673 [BackendPassManager]: curr_vmrss: 2213mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:56Z INFO 67673 [BackendPassManager]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z USER 67673 [BackendPassManager]: Running nc_parallel_pass |
|
2025-08-09T18:45:56Z INFO 67673 [BackendPassManager]: Inputs to nc_parallel_pass: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z USER 67673 [CoreForkPass]: Running dep_reduction |
|
2025-08-09T18:45:56Z INFO 67673 [CoreForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:56Z INFO 67673 [DepReduction]: Start Dependency Reduction |
|
2025-08-09T18:45:56Z INFO 67673 [DepReduction]: Processing async instrs... |
|
2025-08-09T18:45:56Z INFO 67673 [DepReduction]: Processing secondary edges per engine... |
|
2025-08-09T18:45:57Z INFO 67673 [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 473602 |
|
2025-08-09T18:45:57Z INFO 67673 [DepReduction]: Processing redundant descendants, Done. Num edges removed 486433 |
|
2025-08-09T18:45:57Z INFO 67673 [DepReduction]: Processing async instrs, Done. Num edges removed 486433 |
|
2025-08-09T18:45:58Z INFO 67673 [DepReduction]: Num Async removed: 0 |
|
2025-08-09T18:45:58Z INFO 67673 [DepReduction]: Finished dependency reduction: 1150790 removed, new total 112455 |
|
2025-08-09T18:45:58Z INFO 67673 [DepReduction]: Finished Dependency Reduction |
|
2025-08-09T18:45:58Z USER 67673 [CoreForkPass]: dep_reduction finished after 1.704 seconds |
|
2025-08-09T18:45:58Z INFO 67673 [CoreForkPass]: curr_vmrss: 2225mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:58Z INFO 67673 [CoreForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:58Z USER 67673 [CoreForkPass]: Running lower_dynamic_dma |
|
2025-08-09T18:45:58Z INFO 67673 [CoreForkPass]: Inputs to lower_dynamic_dma: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:58Z USER 67673 [CoreForkPass]: lower_dynamic_dma finished after 0.072 seconds |
|
2025-08-09T18:45:58Z INFO 67673 [CoreForkPass]: curr_vmrss: 2225mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:58Z INFO 67673 [CoreForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:58Z USER 67673 [CoreForkPass]: Running legalize_dynamic_dma |
|
2025-08-09T18:45:58Z INFO 67673 [CoreForkPass]: Inputs to legalize_dynamic_dma: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:58Z INFO 67673 [LegalizeDynamicDMA]: Legalize Dynamic DMA scanned 0 DGE instructions |
|
2025-08-09T18:45:58Z INFO 67673 [LegalizeDynamicDMA]: After Legalize Dynamic DMA, 0 DGE instructions were scanned |
|
2025-08-09T18:45:58Z INFO 67673 [LegalizeDynamicDMA]: |
|
βββββββββββββ¬ββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββ |
|
β Sub-Pass β Illegal Instructions Detected β New Instructions Generated β |
|
βββββββββββββΌββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββ€ |
|
β Peeling β 0 β 0 β |
|
β Unrolling β 0 β 0 β |
|
β Splitting β 0 β 0 β |
|
βββββββββββββ΄ββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββ |
|
|
|
2025-08-09T18:45:58Z USER 67673 [CoreForkPass]: legalize_dynamic_dma finished after 0.133 seconds |
|
2025-08-09T18:45:58Z INFO 67673 [CoreForkPass]: curr_vmrss: 2225mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:58Z INFO 67673 [CoreForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279653 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:58Z USER 67673 [CoreForkPass]: Running lower_dma |
|
2025-08-09T18:45:58Z INFO 67673 [CoreForkPass]: Inputs to lower_dma: modules=1 functions=1 allocs=68412 blocks=1 instructions=279653 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:58Z INFO 67673 [LowerDMA]: lower_dma metrics start |
|
IO |
|
Copy (DGE/DMA) |
|
128 partition : 14473/14473 (100% DGE) |
|
power-of-2 partition : 14546/14546 (100% DGE) |
|
> 3 dimensional : 0/0 |
|
non-integer desc size : 0/0 |
|
total : 14546/14546 (100% DGE) |
|
Cast (DGE/DMA) |
|
128 partition : 0/0 |
|
power-of-2 partition : 0/0 |
|
> 3 dimensional : 0/0 |
|
non-integer desc size : 0/0 |
|
total : 0/0 |
|
Spill/Reload |
|
Copy (DGE/DMA) |
|
128 partition : 0/1 (0% DGE) |
|
power-of-2 partition : 0/1 (0% DGE) |
|
> 3 dimensional : 0/0 |
|
non-integer desc size : 0/0 |
|
total : 0/1 (0% DGE) |
|
Cast (DGE/DMA) |
|
128 partition : 0/0 |
|
power-of-2 partition : 0/0 |
|
> 3 dimensional : 0/0 |
|
non-integer desc size : 0/0 |
|
total : 0/0 |
|
CopyMode |
|
CCE : 0 |
|
Transpose : 0 |
|
Replicate : 0 |
|
Dynamic (DGE/DMA) |
|
scalar : 0/0 |
|
vector : 0/0 |
|
Opcode |
|
ReadVarAddr : 0 |
|
IndirectLoad : 0 |
|
IndirectSave : 0 |
|
IndirectSaveAccumulate : 0 |
|
DstReduceDGE : 0 |
|
lower_dma metrics end |
|
2025-08-09T18:45:58Z USER 67673 [CoreForkPass]: lower_dma finished after 0.165 seconds |
|
2025-08-09T18:45:58Z INFO 67673 [CoreForkPass]: curr_vmrss: 2225mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:58Z INFO 67673 [CoreForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279661 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:58Z USER 67673 [CoreForkPass]: Running coalesce_dma_blocks |
|
2025-08-09T18:45:58Z INFO 67673 [CoreForkPass]: Inputs to coalesce_dma_blocks: modules=1 functions=1 allocs=68412 blocks=1 instructions=279661 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:58Z INFO 67673 [CoalesceDmaBlocks]: Coaleseced 0 DMA triggers |
|
2025-08-09T18:45:58Z USER 67673 [CoreForkPass]: coalesce_dma_blocks finished after 0.138 seconds |
|
2025-08-09T18:45:58Z INFO 67673 [CoreForkPass]: curr_vmrss: 2226mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:58Z INFO 67673 [CoreForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279661 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:58Z USER 67673 [CoreForkPass]: Running expand_all_engine |
|
2025-08-09T18:45:59Z INFO 67673 [CoreForkPass]: Inputs to expand_all_engine: modules=1 functions=1 allocs=68412 blocks=1 instructions=279661 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:59Z USER 67673 [CoreForkPass]: expand_all_engine finished after 0.055 seconds |
|
2025-08-09T18:45:59Z INFO 67673 [CoreForkPass]: curr_vmrss: 2226mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:59Z INFO 67673 [CoreForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279661 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:59Z USER 67673 [CoreForkPass]: Running alloc_semaphores |
|
2025-08-09T18:45:59Z INFO 67673 [CoreForkPass]: Inputs to alloc_semaphores: modules=1 functions=1 allocs=68412 blocks=1 instructions=279661 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:59Z USER 67673 [CoreForkPass]: alloc_semaphores finished after 0.291 seconds |
|
2025-08-09T18:45:59Z INFO 67673 [CoreForkPass]: curr_vmrss: 2226mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:59Z INFO 67673 [CoreForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279661 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:59Z USER 67673 [CoreForkPass]: Running expand_inst_late |
|
2025-08-09T18:45:59Z INFO 67673 [CoreForkPass]: Inputs to expand_inst_late: modules=1 functions=1 allocs=68412 blocks=1 instructions=279661 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:59Z USER 67673 [CoreForkPass]: expand_inst_late finished after 0.278 seconds |
|
2025-08-09T18:45:59Z INFO 67673 [CoreForkPass]: curr_vmrss: 2226mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:59Z INFO 67673 [CoreForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279661 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:59Z USER 67673 [CoreForkPass]: Running seq_inst_opt |
|
2025-08-09T18:45:59Z INFO 67673 [CoreForkPass]: Inputs to seq_inst_opt: modules=1 functions=1 allocs=68412 blocks=1 instructions=279661 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:59Z INFO 67673 [SeqInstOpt]: Removing 0 unnecessary InstRegisterMove instruction(s) from Block1 |
|
2025-08-09T18:45:59Z USER 67673 [CoreForkPass]: seq_inst_opt finished after 0.041 seconds |
|
2025-08-09T18:45:59Z INFO 67673 [CoreForkPass]: curr_vmrss: 2226mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:59Z INFO 67673 [CoreForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 279661 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:59Z USER 67673 [CoreForkPass]: Running lower_sync |
|
2025-08-09T18:45:59Z INFO 67673 [CoreForkPass]: Inputs to lower_sync: modules=1 functions=1 allocs=68412 blocks=1 instructions=279661 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:59Z USER 67673 [CoreForkPass]: lower_sync finished after 0.138 seconds |
|
2025-08-09T18:45:59Z INFO 67673 [CoreForkPass]: curr_vmrss: 2226mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:59Z INFO 67673 [CoreForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 295353 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:59Z USER 67673 [CoreForkPass]: Running lower_act |
|
2025-08-09T18:45:59Z INFO 67673 [CoreForkPass]: Inputs to lower_act: modules=1 functions=1 allocs=68412 blocks=1 instructions=295353 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:59Z USER 67673 [CoreForkPass]: lower_act finished after 0.050 seconds |
|
2025-08-09T18:45:59Z INFO 67673 [CoreForkPass]: curr_vmrss: 2226mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:45:59Z INFO 67673 [CoreForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 295354 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:59Z USER 67673 [CoreForkPass]: Running lower_dve |
|
2025-08-09T18:45:59Z INFO 67673 [CoreForkPass]: Inputs to lower_dve: modules=1 functions=1 allocs=68412 blocks=1 instructions=295354 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:45:59Z INFO 67673 [LowerDVE]: Loading DVE opcodes table dve_info.json from /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/dve/dve_bin_gen2/dve_info.json |
|
2025-08-09T18:46:00Z USER 67673 [CoreForkPass]: lower_dve finished after 0.309 seconds |
|
2025-08-09T18:46:00Z INFO 67673 [CoreForkPass]: curr_vmrss: 2254mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:46:00Z INFO 67673 [CoreForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 295354 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:00Z USER 67673 [CoreForkPass]: Running lower_ap |
|
2025-08-09T18:46:00Z INFO 67673 [CoreForkPass]: Inputs to lower_ap: modules=1 functions=1 allocs=68412 blocks=1 instructions=295354 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:00Z USER 67673 [CoreForkPass]: lower_ap finished after 0.069 seconds |
|
2025-08-09T18:46:00Z INFO 67673 [CoreForkPass]: curr_vmrss: 2108mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:46:00Z INFO 67673 [CoreForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 295354 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:00Z USER 67673 [CoreForkPass]: Running coloring_allocator_reg |
|
2025-08-09T18:46:00Z INFO 67673 [CoreForkPass]: Inputs to coloring_allocator_reg: modules=1 functions=1 allocs=68412 blocks=1 instructions=295354 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:00Z INFO 67673 [ColoringAllocator::Rep]: Allocating functions |
|
2025-08-09T18:46:00Z INFO 67673 [ColoringAllocator::Rep]: linearize and check |
|
2025-08-09T18:46:00Z INFO 67673 [REG_Allocator]: allocating REG |
|
2025-08-09T18:46:00Z INFO 67673 [REG_Allocator]: main loop iteration 1 |
|
2025-08-09T18:46:00Z USER 67673 [CoreForkPass]: coloring_allocator_reg finished after 0.055 seconds |
|
2025-08-09T18:46:00Z INFO 67673 [CoreForkPass]: curr_vmrss: 2119mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:46:00Z INFO 67673 [CoreForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 295354 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:00Z USER 67673 [BackendPassManager]: nc_parallel_pass finished after 3.677 seconds |
|
2025-08-09T18:46:00Z INFO 67673 [BackendPassManager]: curr_vmrss: 2119mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:46:00Z INFO 67673 [BackendPassManager]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 295354 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:00Z USER 67673 [BackendPassManager]: Running mod_parallel_pass |
|
2025-08-09T18:46:00Z INFO 67673 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=68412 blocks=1 instructions=295354 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:00Z USER 67673 [ModuleForkPass]: Running birverifier |
|
2025-08-09T18:46:00Z INFO 67673 [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=68412 blocks=1 instructions=295354 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:00Z USER 67673 [ModuleForkPass]: birverifier finished after 0.306 seconds |
|
2025-08-09T18:46:00Z INFO 67673 [ModuleForkPass]: curr_vmrss: 2119mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:46:00Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 295354 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:00Z USER 67673 [BackendPassManager]: mod_parallel_pass finished after 0.321 seconds |
|
2025-08-09T18:46:00Z INFO 67673 [BackendPassManager]: curr_vmrss: 2119mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:46:00Z INFO 67673 [BackendPassManager]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 295354 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:00Z USER 67673 [BackendPassManager]: Running subgraph_parallel_pass |
|
2025-08-09T18:46:00Z INFO 67673 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=68412 blocks=1 instructions=295354 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:00Z USER 67673 [SubgraphForkPass]: Running lnc_verifier |
|
2025-08-09T18:46:00Z INFO 67673 [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=68412 blocks=1 instructions=295354 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:00Z USER 67673 [SubgraphForkPass]: lnc_verifier finished after 0.006 seconds |
|
2025-08-09T18:46:00Z INFO 67673 [SubgraphForkPass]: curr_vmrss: 2119mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:46:00Z INFO 67673 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 295354 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:00Z USER 67673 [BackendPassManager]: subgraph_parallel_pass finished after 0.019 seconds |
|
2025-08-09T18:46:00Z INFO 67673 [BackendPassManager]: curr_vmrss: 2119mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:46:00Z INFO 67673 [BackendPassManager]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 295354 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:00Z USER 67673 [BackendPassManager]: Running mod_parallel_pass |
|
2025-08-09T18:46:00Z INFO 67673 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=68412 blocks=1 instructions=295354 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:00Z USER 67673 [ModuleForkPass]: Running codegen |
|
2025-08-09T18:46:00Z INFO 67673 [ModuleForkPass]: Inputs to codegen: modules=1 functions=1 allocs=68412 blocks=1 instructions=295354 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:00Z INFO 67673 [Codegen]: Total compiler allocated DRAM tensors: 0 GB |
|
2025-08-09T18:46:00Z INFO 67673 [Codegen]: Total un-allocated DRAM tensors by kind: |
|
2025-08-09T18:46:00Z INFO 67673 [Codegen]: |
|
βββββββββββββββββ¬ββββββββββββββ |
|
β TensorKind β Size (GB) β |
|
βββββββββββββββββΌββββββββββββββ€ |
|
β ExternalInput β 7.6285 β |
|
β Const β 3.05176e-05 β |
|
βββββββββββββββββ΄ββββββββββββββ |
|
|
|
2025-08-09T18:46:00Z INFO 67673 [Codegen]: Total runtime managed DRAM tensors: 7.62853 GB |
|
2025-08-09T18:46:01Z INFO 67673 [Codegen]: Instruction Stats: |
|
2025-08-09T18:46:01Z INFO 67673 [Codegen]: |
|
βββββββββββββββββββββββ¬βββββββββ |
|
β Opcode β Count β |
|
βββββββββββββββββββββββΌβββββββββ€ |
|
β LDWEIGHTS β 212041 β |
|
β MATMUL β 212041 β |
|
β ACTIVATE β 53065 β |
|
β EVENT_SEMAPHORE β 15692 β |
|
β UNKNOWN(0xd4) β 14546 β |
|
β NOP β 7 β |
|
β PSEUDO_BRANCH_LABEL β 5 β |
|
β ACT_TABLE_LOAD β 1 β |
|
β PSEUDO_DMA_TRIGGER β 1 β |
|
βββββββββββββββββββββββ΄βββββββββ |
|
|
|
2025-08-09T18:46:01Z INFO 67673 [Codegen]: |
|
ββββββββββββββ¬βββββββββ |
|
β Engine β Count β |
|
ββββββββββββββΌβββββββββ€ |
|
β Unassigned β 0 β |
|
β GPSIMD β 21233 β |
|
β Scalar β 55905 β |
|
β Tensor β 430261 β |
|
β SyncDMA β 0 β |
|
β Vector β 2 β |
|
β Sync β 3 β |
|
β All β 0 β |
|
ββββββββββββββ΄βββββββββ |
|
|
|
2025-08-09T18:46:01Z INFO 67673 [Codegen]: Total instructions: 507404 (0.0302436 GB) |
|
2025-08-09T18:46:01Z INFO 67673 [Codegen]: Total DynamicDMA instruction count: 14546 |
|
2025-08-09T18:46:01Z USER 67673 [Codegen]: isa_gen finished after 1.123 seconds |
|
2025-08-09T18:46:01Z INFO 67673 [Codegen]: Number of DMA descriptors on each queue instance: |
|
βββββββββββββββββββ¬βββββββββββββββββ |
|
β Queue Instance β RT Descriptors β |
|
βββββββββββββββββββΌβββββββββββββββββ€ |
|
β qSPSpillReload0 β 256 β |
|
βββββββββββββββββββ΄βββββββββββββββββ |
|
|
|
Total descriptors: 256 (3.8147e-06 GB) |
|
2025-08-09T18:46:01Z INFO 67673 [Codegen]: Number of DMA engines used by each queue: |
|
βββββββββββββββββββ¬ββββββββββββββββββββββ |
|
β Queue β DMA Engines β |
|
βββββββββββββββββββΌββββββββββββββββββββββ€ |
|
β qSPSpillReload0 β 16 β |
|
β qPoolDynamic β 16 β |
|
βββββββββββββββββββΌββββββββββββββββββββββ€ |
|
β TOTAL β 32 (must be <= 176) β |
|
βββββββββββββββββββ΄ββββββββββββββββββββββ |
|
|
|
2025-08-09T18:46:01Z INFO 67673 [Codegen]: Tensors with largest descriptor count: |
|
ββββββββββββββββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββββββββββββββ |
|
β Tensor Name β Kind β Src Type β Descriptor Count β |
|
ββββββββββββββββββββββββΌβββββββββββΌβββββββββββΌβββββββββββββββββββ€ |
|
β identity_local_25028 β Internal β bfloat16 β 1 β |
|
β identity_25026 β Const β bfloat16 β 1 β |
|
ββββββββββββββββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββββββββββββββ |
|
|
|
2025-08-09T18:46:01Z USER 67673 [Codegen]: dma_desc_gen finished after 0.000 seconds |
|
2025-08-09T18:46:01Z INFO 67673 [Codegen]: Estimated peak DRAM usage: 7.65878 GB |
|
2025-08-09T18:46:01Z INFO 67673 [Codegen]: Generating debug info |
|
2025-08-09T18:46:02Z USER 67673 [Codegen]: debug_info_gen finished after 0.613 seconds |
|
2025-08-09T18:46:02Z USER 67673 [ModuleForkPass]: codegen finished after 1.797 seconds |
|
2025-08-09T18:46:02Z INFO 67673 [ModuleForkPass]: curr_vmrss: 2311mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:46:02Z INFO 67673 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 295354 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:02Z USER 67673 [BackendPassManager]: mod_parallel_pass finished after 1.826 seconds |
|
2025-08-09T18:46:02Z INFO 67673 [BackendPassManager]: curr_vmrss: 2130mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:46:02Z INFO 67673 [BackendPassManager]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 295354 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:02Z USER 67673 [BackendPassManager]: Running neff_packager |
|
2025-08-09T18:46:02Z INFO 67673 [BackendPassManager]: Inputs to neff_packager: modules=1 functions=1 allocs=68412 blocks=1 instructions=295354 Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:02Z WARNING 67673 [NeffFileWriter]: writeKelp missing file /local/p4clients/pkgbuild-const/workspace/build/KaenaCompiler/KaenaCompiler-2.x.169490.0/AL2_x86_64/DEV.STD.PTHREAD/build/private/_skbuild/linux-x86_64-3.10/cmake-build/neuronxcc/walrus/neff_packager/MetricMetadata.json |
|
2025-08-09T18:46:02Z INFO 67673 [NeffFileWriter]: Neff will be written to: /home/ubuntu/qwen3/layout_opt/graph.neff |
|
2025-08-09T18:46:02Z INFO 67673 [NeffFileWriter]: IR signature: c6cb604c4535169891036e23b5114d01 for neff artifacts |
|
2025-08-09T18:46:02Z USER 67673 [BackendPassManager]: neff_packager finished after 0.313 seconds |
|
2025-08-09T18:46:02Z INFO 67673 [BackendPassManager]: curr_vmrss: 2131mb, ru_maxrss: 2494mb (delta=0mb) |
|
2025-08-09T18:46:02Z INFO 67673 [BackendPassManager]: Output has 1 module(s), 1 function(s), 68412 memory location(s), 1 block(s), and 295354 instruction(s). Max writers: 64 Max Readers: 212041 |
|
2025-08-09T18:46:02Z INFO 67673 [BackendDriver]: HBM scratchpad usage summary (post-allocation): |
|
ββββββββ¬ββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββββββ |
|
β Core β Subgraph β Description β Value β |
|
ββββββββΌββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββ€ |
|
β nc00 β module β Peak scratchpad usage: local β 0.000000 GB β |
|
β nc00 β module β Total size of allocated tensors: local β 0.000000 GB β |
|
β nc00 β Max β Peak scratchpad usage: local β 0.000000 GB β |
|
β nc00 β Post-link β Peak scratchpad usage after intermediate tensor allocation β 0.000000 GB β |
|
β nc00 β Post-link β Total size of allocated intermediate tensors β 0.000000 GB β |
|
ββββββββΌββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββ€ |
|
β Max β Max β Peak scratchpad usage β 0.000000 GB β |
|
β Max β Max β Peak scratchpad usage (page-aligned) β 0.000000 GB β |
|
ββββββββ΄ββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββ |
|
|
|
2025-08-09T18:46:02Z INFO 67673 [BackendDriver]: Backend completed successfully, tearing down. |
|
2025-08-09T18:46:03Z INFO 67605 [job.WalrusDriver.0]: Job #0 finished |
|
2025-08-09T18:46:03Z INFO 67605 [pipeline.Pipeline.0]: Finished job job.WalrusDriver.0 |
|
2025-08-09T18:46:03Z INFO 67605 [pipeline.Pipeline.0]: Starting job job.BIRLinker.0 |
|
2025-08-09T18:46:03Z INFO 67605 [job.BIRLinker.0]: Replay this job by calling: /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/bin/neuronx-cc compile --framework XLA --state '{"model": ["/home/ubuntu/qwen3/layout_opt/model/graph.hlo"], "tensormap": "tensor_map.json", "bir": "bir.json", "lorean_sg_key": null, "input_name_map": null, "output_name_map": null, "constant_tensors": null, "state_dir": "/home/ubuntu/neuronxcc-mk9kpjyq/sg00", "state_id": "sg00"}' --pipeline BIRLinker |
|
2025-08-09T18:46:03Z INFO 67605 [job.BIRLinker.0]: BIRLinker cwd: /home/ubuntu/neuronxcc-mk9kpjyq |
|
2025-08-09T18:46:03Z INFO 67605 [job.BIRLinker.0]: Linking not needed. Netlist doesnt exist |
|
2025-08-09T18:46:03Z INFO 67605 [pipeline.Pipeline.0]: Finished job job.BIRLinker.0 |
|
2025-08-09T18:46:03Z INFO 67605 [pipeline.Pipeline.0]: Starting job job.Kelper.0 |
|
2025-08-09T18:46:03Z INFO 67605 [job.Kelper.0]: Skipping neff generation which was already performed by neff_packager |
|
2025-08-09T18:46:03Z INFO 67605 [pipeline.Pipeline.0]: Finished job job.Kelper.0 |
|
2025-08-09T18:46:03Z INFO 67605 [pipeline.Pipeline.0]: Starting job job.NeffWrapper.0 |
|
2025-08-09T18:46:03Z INFO 67605 [job.NeffWrapper.0]: Job NeffWrapper len(in_states) 1 |
|
2025-08-09T18:46:03Z INFO 67605 [job.NeffWrapper.0]: Processing input #0 |
|
2025-08-09T18:46:03Z INFO 67605 [job.NeffWrapper.0]: Start NeffWrapper |
|
2025-08-09T18:46:03Z INFO 67605 [job.NeffWrapper.0]: Executing: /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/starfish/bin/hlo-neff-wrapper --hlo /home/ubuntu/qwen3/layout_opt/model/graph.hlo --neff /home/ubuntu/qwen3/layout_opt/graph.neff --io_transposes /home/ubuntu/neuronxcc-mk9kpjyq/io_transposes.json --output /home/ubuntu/qwen3/layout_opt/wrapped_neff.hlo --netlist /home/ubuntu/neuronxcc-mk9kpjyq/hlo_netlist.json |
|
2025-08-09T18:46:04Z INFO 67605 [job.NeffWrapper.0]: Could not open file: /home/ubuntu/neuronxcc-mk9kpjyq/hlo_netlist.json |
|
There are no io transposes nor zero-sized parameters. Output will not be produced. |
|
Hlo neff wrapper finished successfully. Have a wonderful day :D |
|
|
|
2025-08-09T18:46:04Z INFO 67605 [job.NeffWrapper.0]: Job #0 finished |
|
2025-08-09T18:46:04Z INFO 67605 [pipeline.Pipeline.0]: Finished job job.NeffWrapper.0 |
|
2025-08-09T18:46:04Z INFO 67605 [pipeline.Pipeline.0]: Finished pipeline Pipeline |
|
2025-08-09T18:46:04Z INFO 67605 [pipeline.Pipeline.0]: Job #0 finished |
|
2025-08-09T18:46:04Z INFO 67541 [root]: Subcommand returned with exitcode=0 |
|
|