diff --git "a/token_generation_model/_tp0_bk0/log-neuron-cc.txt" "b/token_generation_model/_tp0_bk0/log-neuron-cc.txt" new file mode 100644--- /dev/null +++ "b/token_generation_model/_tp0_bk0/log-neuron-cc.txt" @@ -0,0 +1,2573 @@ +2025-08-07T13:50:22Z INFO 46994 [root]: /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/bin/neuronx-cc compile --framework=XLA /home/ubuntu/qwen3/token_generation_model/_tp0_bk0/model.MODULE_6ef5ba8b41fbbe77f080+74ae8282.hlo_module.pb --output /home/ubuntu/qwen3/token_generation_model/_tp0_bk0/model.MODULE_6ef5ba8b41fbbe77f080+74ae8282.neff --target=trn1 --auto-cast=none --model-type=transformer '--tensorizer-options=--enable-ccop-compute-overlap --cc-pipeline-tiling-factor=1 --vectorize-strided-dma' --lnc=1 -O2 --internal-hlo2tensorizer-options=--verify-hlo=true --logfile=/home/ubuntu/qwen3/token_generation_model/_tp0_bk0/log-neuron-cc.txt --enable-internal-neff-wrapper --verbose=35 +2025-08-07T13:50:22Z INFO 46994 [root]: NeuronX Compiler version 2.20.9961.0+0acef03a Python version 3.10.12 HWM version 2.20.0.9961+0acef03a NumPy version 1.26.4 Running on AMI ami-040348201d80b58ad Running in region usw2-az4 +2025-08-07T13:50:22Z INFO 47058 [root]: XLA detected +2025-08-07T13:50:22Z INFO 47058 [root]: Pipeline: HLOToTensorizer Frontend StaticIOTranspose WalrusDriver BIRLinker Kelper NeffWrapper +2025-08-07T13:50:23Z INFO 47058 [root]: Intermediate files stored in /home/ubuntu/qwen3/token_generation_model/_tp0_bk0/neuronxcc-ykq_7n9z, output in /home/ubuntu/qwen3/token_generation_model/_tp0_bk0 +2025-08-07T13:50:23Z INFO 47058 [pipeline.Pipeline.0]: Job Pipeline len(in_states) 1 +2025-08-07T13:50:23Z INFO 47058 [pipeline.Pipeline.0]: Processing input #0 +2025-08-07T13:50:23Z INFO 47058 [pipeline.Pipeline.0]: Running pipeline Pipeline.0 +2025-08-07T13:50:23Z INFO 47058 [pipeline.Pipeline.0]: Starting job job.HLOToTensorizer.0 +2025-08-07T13:50:23Z INFO 47058 [job.HLOToTensorizer.0]: Job HLOToTensorizer len(in_states) 1 +2025-08-07T13:50:23Z INFO 47058 [job.HLOToTensorizer.0]: Processing input #0 +2025-08-07T13:50:23Z INFO 47058 [job.HLOToTensorizer.0]: IR signature: 1b84de1c7109b93d3bf677f50a6adfce9d88aab86c7f512a7234c08cd856732f for model.MODULE_6ef5ba8b41fbbe77f080+74ae8282.hlo_module.pb +2025-08-07T13:50:23Z INFO 47058 [job.HLOToTensorizer.0]: Executing: /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/starfish/bin/hlo2penguin --input /home/ubuntu/qwen3/token_generation_model/_tp0_bk0/model.MODULE_6ef5ba8b41fbbe77f080+74ae8282.hlo_module.pb --out-dir ./ --output penguin.py --remat --max-costly-ops=2 --max-live-in-size=5 --max-remat-chain-size=10 --max-mem-multiple=1.8 --min-def-use-distance=500 --remat-policy=transformer --allow-same-pass-remat=true --layers-per-module=1 --emit-tensor-level-dropout-ops --verify-hlo=true --native-to-custom-softmax --partitioner-opts='--transformer' +2025-08-07T13:50:23Z INFO 47058 [job.HLOToTensorizer.0]: DEBUG: needsModular? No. macCnt 3803070528 num non-trivial Ops 3790 +INFO: Switching to single-module compile. PrePartitionPipe skipped. +INFO: Found memory bound graph +INFO: Number of Native SoftmaxDx's detected and replaced: 0 +INFO: Number of Native Softmax's detected and replaced: 2 +Replaced 0 dropout sequences with OffloadedDropout +INFO: HloMacCount has found 3802996736 +INFO: Traffic has found 8267154541 +INFO: AIF 0.920026 +HLO Ops used in computation: add all-gather all-reduce broadcast compare concatenate constant convert cosine custom-call divide dot exponential gather get-tuple-element iota maximum multiply negate pad parameter reduce reshape rng scatter select sine slice subtract transpose tuple +Warning: Could not open file debug_info_hlo_partitions.json +2025-08-07 13:50:23.216940: W hilo/hlo2penguin/utils/DumpDebugInfo.cc:52] Truncating long HLO operator name %tuple.13243 = tuple(%reshape.4825, %scatter.12247, %scatter.12262, %scatter.12275, %scatter.12290, %scatter.12303, %scatter.12318, %scatter.12331, %scatter.12346, %scatter.12359, %scatter.12374, %scatter.12387, %scatter.12402, %scatter.12415, %scatter.12430, %scatter.12443, %scatter.12458, %scatter.12471, %scatter.12486, %scatter.12499, %scatter.12514, %scatter.12527, %scatter.12542, %scatter.12555, %scatter.12570, %scatter.12583, %scatter.12598, %scatter.12611, %scatter.12626, %scatter.12639, %scatter.... to 512 characters in the compiler's debug metadata +Transposable weight idxs: 76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,256,257,258,259,260,261,262,263,264,265,266,267,268,269,270,271,272,273,274,275,276,277,278,279,280,281,282,283,284,285,286,287,288,289,290,291,292,293,294,295,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,346,347,348,349,350,351,352,353,354,355,356,357,358,359,360,361,362,363,364,365,366,367,368,369,370,371,372,373,374,375,376,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474 +Invoking RemoveOptimizationBarriers pass + +2025-08-07T13:50:23Z INFO 47058 [job.HLOToTensorizer.0]: IR signature: 7238a055a0d23cd25fb386dbd049f9c2ed4801d02b78ba29302b97418a4b62c9 for sg0000/HLOToTensorizer +2025-08-07T13:50:25Z INFO 47058 [job.HLOToTensorizer.0]: Job #0 finished +2025-08-07T13:50:26Z INFO 47058 [pipeline.Pipeline.0]: Finished job job.HLOToTensorizer.0 +2025-08-07T13:50:26Z INFO 47058 [pipeline.Pipeline.0]: Starting job job.Frontend.0 +2025-08-07T13:50:26Z INFO 47058 [job.Frontend.0]: Job Frontend len(in_states) 1 +2025-08-07T13:50:26Z INFO 47058 [job.Frontend.0]: Processing input #0 +2025-08-07T13:50:26Z INFO 47058 [job.Frontend.0]: Start model loading +2025-08-07T13:50:26Z INFO 47058 [job.Frontend.0]: Start tensorization +2025-08-07T13:50:27Z INFO 47058 [job.Frontend.0]: Num jobs: 1 +2025-08-07T13:50:27Z USER 47058 [root/Tensorizer/Tensorizer]: Running Tensorizer +2025-08-07T13:50:27Z INFO 47058 [Tensorizer]: Frontend did not find netlist info. Switching to flat flow. +2025-08-07T13:50:27Z INFO 47058 [Tensorizer]: Building model from Penguin script "penguin.py"... +2025-08-07T13:50:27Z INFO 47058 [Tensorizer]: Tensorizer options: --enable-ccop-compute-overlap --cc-pipeline-tiling-factor=1 --vectorize-strided-dma --run-pg-layout-and-tiling --enable-dse-after-mask-propagation --disable-concat-delinearizer --num-neuroncores-per-sengine=1 --num-neuroncores-per-sengine=1 --internal_dynamic_dma_scratch_size_per_partition=16384 --disable-bitcasted-transpose --dont-verify-after-all --fp32-cast=none --mm-transpose-type=fp32 --disable-expensive-checks --disable-max-stride-tiling --hbm-scratchpad-page-size-in-bytes=536870912 --enable-replication --max-local-tensor-tile-size-in-bytes=32768 --tensor-layout-p-order=0 --tensor-layout-b-order=1 --enable-advanced-delinearization --weight-coalescing-threshold=512 --enable-bir-converter=enable --enable-tritium-loopfusion --keep-remat-dma-transpose --enable-softmax-kernel --model-type-transformer --enable-isl-in-injective-check --enable-dge-on-io-dma --enable-dge-on-indirect-dma --enable-dge-on-vector-indirect-dma --keep-rng-tensor-op +2025-08-07T13:50:28Z INFO 47058 [sg0000/Tensorizer/DoNothing]: Running DoNothing +2025-08-07T13:50:28Z INFO 47058 [sg0000/Tensorizer/DoNothing]: Finished (changed=True) +2025-08-07T13:50:28Z INFO 47058 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-08-07T13:50:28Z INFO 47058 [sg0000/Tensorizer/LegalizeOpLevelAlias]: Running LegalizeOpLevelAlias +2025-08-07T13:50:28Z INFO 47058 [sg0000/Tensorizer/LegalizeOpLevelAlias]: Finished (changed=False) +2025-08-07T13:50:29Z INFO 47058 [sg0000/Tensorizer/LegalizeOpLevelAlias]: LegalizeOpLevelAlias finished after 0.033 seconds +2025-08-07T13:50:29Z INFO 47058 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: Running OptimizeAliasedCopyChain +2025-08-07T13:50:29Z INFO 47058 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: Finished (changed=False) +2025-08-07T13:50:30Z INFO 47058 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: OptimizeAliasedCopyChain finished after 0.015 seconds +2025-08-07T13:50:30Z INFO 47058 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-08-07T13:50:30Z INFO 47058 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=True) +2025-08-07T13:50:31Z INFO 47058 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.106 seconds +2025-08-07T13:50:31Z INFO 47058 [sg0000/Tensorizer/TransformConvOp]: Running TransformConvOp +2025-08-07T13:50:31Z INFO 47058 [sg0000/Tensorizer/TransformConvOp]: Finished (changed=False) +2025-08-07T13:50:32Z INFO 47058 [sg0000/Tensorizer/TransformConvOp]: TransformConvOp finished after 0.063 seconds +2025-08-07T13:50:32Z INFO 47058 [sg0000/Tensorizer/LowerTensorOp]: Running LowerTensorOp +2025-08-07T13:50:32Z INFO 47058 [sg0000/Tensorizer/LowerTensorOp]: Finished (changed=True) +2025-08-07T13:50:33Z INFO 47058 [sg0000/Tensorizer/LowerTensorOp]: LowerTensorOp finished after 0.490 seconds +2025-08-07T13:50:33Z INFO 47058 [sg0000/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-08-07T13:50:33Z INFO 47058 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-08-07T13:50:33Z INFO 47058 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=True) +2025-08-07T13:50:34Z INFO 47058 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.004 seconds +2025-08-07T13:50:34Z INFO 47058 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-08-07T13:50:34Z INFO 47058 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=True) +2025-08-07T13:50:35Z INFO 47058 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.340 seconds +2025-08-07T13:50:36Z INFO 47058 [sg0000/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 2.049 seconds +2025-08-07T13:50:36Z INFO 47058 [sg0000/Tensorizer/TensorOpSimplifier]: Running TensorOpSimplifier +2025-08-07T13:50:36Z INFO 47058 [sg0000/Tensorizer/TensorOpSimplifier]: Finished (changed=True) +2025-08-07T13:50:37Z INFO 47058 [sg0000/Tensorizer/TensorOpSimplifier]: TensorOpSimplifier finished after 0.277 seconds +2025-08-07T13:50:37Z INFO 47058 [sg0000/Tensorizer/CanonicalizeIR]: Running CanonicalizeIR +2025-08-07T13:50:37Z INFO 47058 [sg0000/Tensorizer/CanonicalizeIR]: Finished (changed=True) +2025-08-07T13:50:38Z INFO 47058 [sg0000/Tensorizer/CanonicalizeIR]: CanonicalizeIR finished after 0.066 seconds +2025-08-07T13:50:38Z INFO 47058 [sg0000/Tensorizer/LegalizeCCOpLayout]: Running LegalizeCCOpLayout +2025-08-07T13:50:38Z INFO 47058 [sg0000/Tensorizer/LegalizeCCOpLayout]: Finished (changed=True) +2025-08-07T13:50:39Z INFO 47058 [sg0000/Tensorizer/LegalizeCCOpLayout]: LegalizeCCOpLayout finished after 0.073 seconds +2025-08-07T13:50:39Z INFO 47058 [sg0000/Tensorizer/ResolveComplicatePredicates]: Running ResolveComplicatePredicates +2025-08-07T13:50:39Z INFO 47058 [sg0000/Tensorizer/ResolveComplicatePredicates]: Finished (changed=False) +2025-08-07T13:50:40Z INFO 47058 [sg0000/Tensorizer/ResolveComplicatePredicates]: ResolveComplicatePredicates finished after 0.050 seconds +2025-08-07T13:50:40Z INFO 47058 [sg0000/Tensorizer/AffinePredicateResolution]: Running AffinePredicateResolution +2025-08-07T13:50:40Z INFO 47058 [sg0000/Tensorizer/AffinePredicateResolution]: Finished (changed=False) +2025-08-07T13:50:41Z INFO 47058 [sg0000/Tensorizer/AffinePredicateResolution]: AffinePredicateResolution finished after 0.052 seconds +2025-08-07T13:50:41Z INFO 47058 [sg0000/Tensorizer/EliminateDivs]: Running EliminateDivs +2025-08-07T13:50:41Z INFO 47058 [sg0000/Tensorizer/EliminateDivs]: Finished (changed=False) +2025-08-07T13:50:42Z INFO 47058 [sg0000/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.190 seconds +2025-08-07T13:50:42Z INFO 47058 [sg0000/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-08-07T13:50:42Z INFO 47058 [sg0000/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-08-07T13:50:43Z INFO 47058 [sg0000/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.050 seconds +2025-08-07T13:50:43Z INFO 47058 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:50:43Z INFO 47058 [sg0000/Tensorizer/Simplifier]: Finished (changed=True) +2025-08-07T13:50:44Z INFO 47058 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.730 seconds +2025-08-07T13:50:44Z INFO 47058 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-08-07T13:50:44Z INFO 47058 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-08-07T13:50:45Z INFO 47058 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.049 seconds +2025-08-07T13:50:45Z INFO 47058 [sg0000/Tensorizer/TCTransform]: Running TCTransform +2025-08-07T13:50:45Z INFO 47058 [sg0000/Tensorizer/TCTransform]: Finished (changed=False) +2025-08-07T13:50:46Z INFO 47058 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.051 seconds +2025-08-07T13:50:46Z INFO 47058 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-08-07T13:50:46Z INFO 47058 [sg0000/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-08-07T13:50:47Z INFO 47058 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.052 seconds +2025-08-07T13:50:47Z INFO 47058 [sg0000/Tensorizer/ExpandBatchNorm]: Running ExpandBatchNorm +2025-08-07T13:50:47Z INFO 47058 [sg0000/Tensorizer/ExpandBatchNorm]: Finished (changed=False) +2025-08-07T13:50:48Z INFO 47058 [sg0000/Tensorizer/ExpandBatchNorm]: ExpandBatchNorm finished after 0.060 seconds +2025-08-07T13:50:48Z INFO 47058 [sg0000/Tensorizer/TCTransform]: Running TCTransform +2025-08-07T13:50:48Z INFO 47058 [sg0000/Tensorizer/TCTransform]: Finished (changed=False) +2025-08-07T13:50:49Z INFO 47058 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.052 seconds +2025-08-07T13:50:49Z INFO 47058 [sg0000/Tensorizer/EliminateDivs]: Running EliminateDivs +2025-08-07T13:50:49Z INFO 47058 [sg0000/Tensorizer/EliminateDivs]: Finished (changed=False) +2025-08-07T13:50:50Z INFO 47058 [sg0000/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.173 seconds +2025-08-07T13:50:50Z INFO 47058 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-08-07T13:50:50Z INFO 47058 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-08-07T13:50:51Z INFO 47058 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.050 seconds +2025-08-07T13:50:51Z INFO 47058 [sg0000/Tensorizer/TensorOpTransform]: Running TensorOpTransform +2025-08-07T13:50:52Z INFO 47058 [sg0000/Tensorizer/TensorOpTransform]: Finished (changed=True) +2025-08-07T13:50:52Z INFO 47058 [sg0000/Tensorizer/TensorOpTransform]: TensorOpTransform finished after 0.865 seconds +2025-08-07T13:50:52Z INFO 47058 [sg0000/Tensorizer/LateLowerTensorOp]: Running LateLowerTensorOp +2025-08-07T13:50:53Z INFO 47058 [sg0000/Tensorizer/LateLowerTensorOp]: Finished (changed=True) +2025-08-07T13:50:54Z INFO 47058 [sg0000/Tensorizer/LateLowerTensorOp]: LateLowerTensorOp finished after 0.336 seconds +2025-08-07T13:50:54Z INFO 47058 [sg0000/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-08-07T13:50:54Z INFO 47058 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-08-07T13:50:54Z INFO 47058 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=True) +2025-08-07T13:50:55Z INFO 47058 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.005 seconds +2025-08-07T13:50:55Z INFO 47058 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-08-07T13:50:55Z INFO 47058 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=True) +2025-08-07T13:50:55Z INFO 47058 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.449 seconds +2025-08-07T13:50:55Z INFO 47058 [sg0000/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 1.268 seconds +2025-08-07T13:50:55Z INFO 47058 [sg0000/Tensorizer/MemcpyElimination]: Running MemcpyElimination +2025-08-07T13:50:59Z INFO 47058 [sg0000/Tensorizer/MemcpyElimination]: Finished (changed=True) +2025-08-07T13:50:59Z INFO 47058 [sg0000/Tensorizer/MemcpyElimination]: MemcpyElimination finished after 3.987 seconds +2025-08-07T13:50:59Z INFO 47058 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion +2025-08-07T13:51:01Z INFO 47058 [sg0000/Tensorizer/LoopFusion]: Finished (changed=True) +2025-08-07T13:51:01Z INFO 47058 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 1.929 seconds +2025-08-07T13:51:01Z INFO 47058 [sg0000/Tensorizer/Rematerialization]: Running Rematerialization +2025-08-07T13:51:01Z INFO 47058 [sg0000/Tensorizer/Rematerialization]: Finished (changed=True) +2025-08-07T13:51:01Z INFO 47058 [sg0000/Tensorizer/Rematerialization]: Rematerialization finished after 0.168 seconds +2025-08-07T13:51:01Z INFO 47058 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:51:01Z INFO 47058 [sg0000/Tensorizer/Simplifier]: Finished (changed=True) +2025-08-07T13:51:01Z INFO 47058 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.364 seconds +2025-08-07T13:51:01Z INFO 47058 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:51:02Z INFO 47058 [sg0000/Tensorizer/Delinearization]: Finished (changed=True) +2025-08-07T13:51:02Z INFO 47058 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.563 seconds +2025-08-07T13:51:02Z INFO 47058 [sg0000/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-08-07T13:51:03Z INFO 47058 [sg0000/Tensorizer/DeadStoreElimination]: Finished (changed=True) +2025-08-07T13:51:03Z INFO 47058 [sg0000/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 1.291 seconds +2025-08-07T13:51:03Z INFO 47058 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:51:03Z INFO 47058 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) +2025-08-07T13:51:03Z INFO 47058 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.118 seconds +2025-08-07T13:51:03Z INFO 47058 [sg0000/Tensorizer/LICM]: Running LICM +2025-08-07T13:51:04Z INFO 47058 [sg0000/Tensorizer/LICM]: Finished (changed=True) +2025-08-07T13:51:04Z INFO 47058 [sg0000/Tensorizer/LICM]: LICM finished after 0.081 seconds +2025-08-07T13:51:04Z INFO 47058 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:51:04Z INFO 47058 [sg0000/Tensorizer/Delinearization]: Finished (changed=True) +2025-08-07T13:51:04Z INFO 47058 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.130 seconds +2025-08-07T13:51:04Z INFO 47058 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion +2025-08-07T13:51:04Z INFO 47058 [sg0000/Tensorizer/LoopFusion]: Finished (changed=True) +2025-08-07T13:51:04Z INFO 47058 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.578 seconds +2025-08-07T13:51:04Z INFO 47058 [sg0000/Tensorizer/SimplifySlice]: Running SimplifySlice +2025-08-07T13:51:04Z INFO 47058 [sg0000/Tensorizer/SimplifySlice]: Finished (changed=False) +2025-08-07T13:51:04Z INFO 47058 [sg0000/Tensorizer/SimplifySlice]: SimplifySlice finished after 0.034 seconds +2025-08-07T13:51:04Z INFO 47058 [sg0000/Tensorizer/LICM]: Running LICM +2025-08-07T13:51:04Z INFO 47058 [sg0000/Tensorizer/LICM]: Finished (changed=True) +2025-08-07T13:51:04Z INFO 47058 [sg0000/Tensorizer/LICM]: LICM finished after 0.062 seconds +2025-08-07T13:51:04Z INFO 47058 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/Simplifier]: Finished (changed=True) +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.231 seconds +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/ValueNumbering]: Finished (changed=True) +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.122 seconds +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/LICM]: Running LICM +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/LICM]: Finished (changed=False) +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/LICM]: LICM finished after 0.059 seconds +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/PadElimination]: Running PadElimination +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/PadElimination]: Finished (changed=False) +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/PadElimination]: PadElimination finished after 0.009 seconds +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.129 seconds +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/LoopFusion]: Finished (changed=False) +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.312 seconds +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.031 seconds +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.112 seconds +2025-08-07T13:51:05Z INFO 47058 [sg0000/Tensorizer/LICM]: Running LICM +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/LICM]: Finished (changed=True) +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/LICM]: LICM finished after 0.060 seconds +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/ValueNumbering]: Finished (changed=False) +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.093 seconds +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/TCTransform]: Running TCTransform +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/TCTransform]: Finished (changed=False) +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.034 seconds +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.033 seconds +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/RecognizeOpIdiom]: Finished (changed=False) +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom finished after 0.205 seconds +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/MaskPropagation]: Finished (changed=False) +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.124 seconds +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/DeadStoreElimination]: Finished (changed=False) +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.372 seconds +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/Recompute]: Running Recompute +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/Recompute]: Finished (changed=False) +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/Recompute]: Recompute finished after 0.006 seconds +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.035 seconds +2025-08-07T13:51:06Z INFO 47058 [Tensorizer]: After optimization: 1185 statements +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/DoNothing]: Running DoNothing +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/DoNothing]: Finished (changed=True) +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-08-07T13:51:06Z INFO 47058 [sg0000/Tensorizer/MutateDataType]: Running MutateDataType +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/MutateDataType]: Finished (changed=False) +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/MutateDataType]: MutateDataType finished after 0.043 seconds +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.032 seconds +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.114 seconds +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/TileCCOps]: Running TileCCOps +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/TileCCOps]: pass did not tile CC tensor due to `multi_rank_size=8192 is not above min_allgather_tile_size_in_bytes=8388608` +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/TileCCOps]: in bfloat16 (4096,) %'all_gather.1' = AllGatherOp-502 AllGather_add(bfloat16 (2048,) %'gather.1', replica_groups = [[0, 1]],all_gather_dim = DimensionSet((4096,), {0}),stream_id = -1) # dl = tensor_op_name: _all-gather.47 | hlo_id: 47 | , id = 502 +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/TileCCOps]: pass did not tile CC tensor due to `All gather output tensor check failed` +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/TileCCOps]: in float32 (512,) %'all_gather.2' = AllGatherOp-9601 AllGather_add(float32 (256,) %'add.217', replica_groups = [[0, 1]],all_gather_dim = DimensionSet((512,), {0}),stream_id = -1) # dl = tensor_op_name: _all-gather.12078 | hlo_id: 12078 | , id = 9601 +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/TileCCOps]: pass did not tile CC tensor due to `multi_rank_size=2048 is not above min_allgather_tile_size_in_bytes=8388608` +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/TileCCOps]: in uint32 (512,) %'all_gather.3' = AllGatherOp-9617 AllGather_add(uint32 (256,) %'add.218', replica_groups = [[0, 1]],all_gather_dim = DimensionSet((512,), {0}),stream_id = -1) # dl = tensor_op_name: _all-gather.12213 | hlo_id: 12213 | , id = 9617 +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/TileCCOps]: Finished (changed=False) +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/TileCCOps]: TileCCOps finished after 0.264 seconds +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=True) +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.347 seconds +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.123 seconds +2025-08-07T13:51:07Z INFO 47058 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-08-07T13:51:08Z INFO 47058 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-08-07T13:51:08Z INFO 47058 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.274 seconds +2025-08-07T13:51:08Z INFO 47058 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-08-07T13:51:08Z INFO 47058 [sg0000/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-08-07T13:51:08Z INFO 47058 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.036 seconds +2025-08-07T13:51:08Z INFO 47058 [sg0000/Tensorizer/LateLowerReshapeOp]: Running LateLowerReshapeOp +2025-08-07T13:51:08Z INFO 47058 [sg0000/Tensorizer/LateLowerReshapeOp]: Finished (changed=False) +2025-08-07T13:51:08Z INFO 47058 [sg0000/Tensorizer/LateLowerReshapeOp]: LateLowerReshapeOp finished after 0.041 seconds +2025-08-07T13:51:08Z INFO 47058 [sg0000/Tensorizer/InferIntrinsicOnCC]: Running InferIntrinsicOnCC +2025-08-07T13:51:08Z INFO 47058 [sg0000/Tensorizer/InferIntrinsicOnCC]: Finished (changed=True) +2025-08-07T13:51:08Z INFO 47058 [sg0000/Tensorizer/InferIntrinsicOnCC]: InferIntrinsicOnCC finished after 0.343 seconds +2025-08-07T13:51:08Z INFO 47058 [sg0000/Tensorizer/ResolveAccessConflict]: Running ResolveAccessConflict +2025-08-07T13:51:08Z INFO 47058 [sg0000/Tensorizer/ResolveAccessConflict]: Finished (changed=True) +2025-08-07T13:51:08Z INFO 47058 [sg0000/Tensorizer/ResolveAccessConflict]: ResolveAccessConflict finished after 0.240 seconds +2025-08-07T13:51:08Z INFO 47058 [sg0000/Tensorizer/LICM]: Running LICM +2025-08-07T13:51:08Z INFO 47058 [sg0000/Tensorizer/LICM]: Finished (changed=True) +2025-08-07T13:51:09Z INFO 47058 [sg0000/Tensorizer/LICM]: LICM finished after 0.068 seconds +2025-08-07T13:51:09Z INFO 47058 [sg0000/Tensorizer/LocalLayoutOpt]: Running LocalLayoutOpt +2025-08-07T13:51:09Z INFO 47058 [sg0000/Tensorizer/LocalLayoutOpt]: Finished (changed=False) +2025-08-07T13:51:09Z INFO 47058 [sg0000/Tensorizer/LocalLayoutOpt]: LocalLayoutOpt finished after 0.362 seconds +2025-08-07T13:51:09Z INFO 47058 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-08-07T13:51:09Z INFO 47058 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-08-07T13:51:09Z INFO 47058 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.289 seconds +2025-08-07T13:51:09Z INFO 47058 [sg0000/Tensorizer/PGLayoutTilingPipeline]: Running PGLayoutTilingPipeline +2025-08-07T13:51:09Z INFO 47058 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-08-07T13:51:09Z INFO 47058 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) +2025-08-07T13:51:09Z INFO 47058 [sg0000/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.241 seconds +2025-08-07T13:51:09Z INFO 47058 [sg0000/Tensorizer/LayoutPreprocessingAndAnalysis]: Running LayoutPreprocessingAndAnalysis +2025-08-07T13:51:09Z INFO 47058 [sg0000/Tensorizer/LayoutPreprocessing]: Running LayoutPreprocessing +2025-08-07T13:51:10Z INFO 47058 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:51:10Z INFO 47058 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) +2025-08-07T13:51:10Z INFO 47058 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.130 seconds +2025-08-07T13:51:10Z INFO 47058 [sg0000/Tensorizer/LayoutPreprocessing]: Finished (changed=True) +2025-08-07T13:51:10Z INFO 47058 [sg0000/Tensorizer/LayoutPreprocessing]: LayoutPreprocessing finished after 0.944 seconds +2025-08-07T13:51:10Z INFO 47058 [sg0000/Tensorizer/LayoutRequirementAnalysis]: Running LayoutRequirementAnalysis +2025-08-07T13:51:11Z INFO 47058 [sg0000/Tensorizer/LayoutRequirementAnalysis]: LayoutRequirementAnalysis finished after 0.309 seconds +2025-08-07T13:51:11Z INFO 47058 [sg0000/Tensorizer/LayoutPreprocessingAndAnalysis]: LayoutPreprocessingAndAnalysis finished after 1.268 seconds +2025-08-07T13:51:11Z INFO 47058 [sg0000/Tensorizer/InferNonlocalTensors]: Running InferNonlocalTensors +2025-08-07T13:51:11Z INFO 47058 [sg0000/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-08-07T13:51:12Z INFO 47058 [sg0000/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-08-07T13:51:14Z INFO 47058 [sg0000/Tensorizer/InferNonlocalTensors]: Finished (changed=False) +2025-08-07T13:51:14Z INFO 47058 [sg0000/Tensorizer/InferNonlocalTensors]: InferNonlocalTensors finished after 3.631 seconds +2025-08-07T13:51:14Z INFO 47058 [sg0000/Tensorizer/PAGLayoutOpt]: Running PAGLayoutOpt +2025-08-07T13:51:14Z INFO 47058 [sg0000/Tensorizer/ParAxesAnnotation]: Running ParAxesAnnotation +2025-08-07T13:51:15Z INFO 47058 [sg0000/Tensorizer/LayoutSearchAlgorithm]: prefer_non_broadcast_par: True +2025-08-07T13:51:40Z INFO 47058 [sg0000/Tensorizer/ParAxesAnnotation]: Finished (changed=True) +2025-08-07T13:51:40Z INFO 47058 [sg0000/Tensorizer/ParAxesAnnotation]: ParAxesAnnotation finished after 25.272 seconds +2025-08-07T13:51:40Z INFO 47058 [sg0000/Tensorizer/InsertLocalTransposes]: Running InsertLocalTransposes +2025-08-07T13:51:41Z INFO 47058 [sg0000/Tensorizer/InsertLocalTransposes]: Finished (changed=True) +2025-08-07T13:51:41Z INFO 47058 [sg0000/Tensorizer/InsertLocalTransposes]: InsertLocalTransposes finished after 1.035 seconds +2025-08-07T13:51:41Z INFO 47058 [sg0000/Tensorizer/PAGLayoutOpt]: PAGLayoutOpt finished after 26.323 seconds +2025-08-07T13:51:41Z INFO 47058 [sg0000/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-08-07T13:51:41Z INFO 47058 [sg0000/Tensorizer/MaskPropagation]: Finished (changed=False) +2025-08-07T13:51:41Z INFO 47058 [sg0000/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.144 seconds +2025-08-07T13:51:41Z INFO 47058 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: Running CanonicalizeDAGForPGTiling +2025-08-07T13:51:41Z INFO 47058 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: Finished (changed=True) +2025-08-07T13:51:41Z INFO 47058 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: CanonicalizeDAGForPGTiling finished after 0.212 seconds +2025-08-07T13:51:41Z INFO 47058 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-08-07T13:51:41Z INFO 47058 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) +2025-08-07T13:51:41Z INFO 47058 [sg0000/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.231 seconds +2025-08-07T13:51:41Z INFO 47058 [sg0000/Tensorizer/PGTiling]: Running PGTiling +2025-08-07T13:51:41Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: Running AGOrderingAnalysisPass +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 9858 of IO tensor {'CrossPassTensor': ''}bfloat16 %input4|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(1, 'AG2151'), (188, 'AG2146'), (80, 'AG2149')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 10135 of IO tensor {'CrossPassTensor': ''}bfloat16 %input6|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(3, 'AG2159'), (188, 'AG2146'), (82, 'AG2157')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 10386 of IO tensor {'CrossPassTensor': ''}bfloat16 %input8|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(5, 'AG2166'), (188, 'AG2146'), (84, 'AG2164')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 10637 of IO tensor {'CrossPassTensor': ''}bfloat16 %input10|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(7, 'AG2173'), (188, 'AG2146'), (86, 'AG2171')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 10888 of IO tensor {'CrossPassTensor': ''}bfloat16 %input12|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(9, 'AG2180'), (188, 'AG2146'), (88, 'AG2178')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 11139 of IO tensor {'CrossPassTensor': ''}bfloat16 %input14|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(11, 'AG2187'), (188, 'AG2146'), (90, 'AG2185')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 11390 of IO tensor {'CrossPassTensor': ''}bfloat16 %input16|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(13, 'AG2194'), (188, 'AG2146'), (92, 'AG2192')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 11641 of IO tensor {'CrossPassTensor': ''}bfloat16 %input18|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(15, 'AG2201'), (188, 'AG2146'), (94, 'AG2199')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 11892 of IO tensor {'CrossPassTensor': ''}bfloat16 %input20|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(17, 'AG2208'), (188, 'AG2146'), (96, 'AG2206')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 12143 of IO tensor {'CrossPassTensor': ''}bfloat16 %input22|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(19, 'AG2215'), (188, 'AG2146'), (98, 'AG2213')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 12394 of IO tensor {'CrossPassTensor': ''}bfloat16 %input24|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(21, 'AG2222'), (188, 'AG2146'), (100, 'AG2220')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 12645 of IO tensor {'CrossPassTensor': ''}bfloat16 %input26|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(23, 'AG2229'), (188, 'AG2146'), (102, 'AG2227')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 12896 of IO tensor {'CrossPassTensor': ''}bfloat16 %input28|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(25, 'AG2236'), (188, 'AG2146'), (104, 'AG2234')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 13147 of IO tensor {'CrossPassTensor': ''}bfloat16 %input30|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(27, 'AG2243'), (188, 'AG2146'), (106, 'AG2241')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 13398 of IO tensor {'CrossPassTensor': ''}bfloat16 %input32|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(29, 'AG2250'), (188, 'AG2146'), (108, 'AG2248')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 13649 of IO tensor {'CrossPassTensor': ''}bfloat16 %input34|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(31, 'AG2257'), (188, 'AG2146'), (110, 'AG2255')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 13900 of IO tensor {'CrossPassTensor': ''}bfloat16 %input36|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(33, 'AG2264'), (188, 'AG2146'), (112, 'AG2262')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 14151 of IO tensor {'CrossPassTensor': ''}bfloat16 %input38|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(35, 'AG2271'), (188, 'AG2146'), (114, 'AG2269')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 14402 of IO tensor {'CrossPassTensor': ''}bfloat16 %input40|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(37, 'AG2278'), (188, 'AG2146'), (116, 'AG2276')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 14653 of IO tensor {'CrossPassTensor': ''}bfloat16 %input42|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(39, 'AG2285'), (188, 'AG2146'), (118, 'AG2283')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 14904 of IO tensor {'CrossPassTensor': ''}bfloat16 %input44|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(41, 'AG2292'), (188, 'AG2146'), (120, 'AG2290')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 15155 of IO tensor {'CrossPassTensor': ''}bfloat16 %input46|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(43, 'AG2299'), (188, 'AG2146'), (122, 'AG2297')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 15406 of IO tensor {'CrossPassTensor': ''}bfloat16 %input48|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(45, 'AG2306'), (188, 'AG2146'), (124, 'AG2304')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 15657 of IO tensor {'CrossPassTensor': ''}bfloat16 %input50|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(47, 'AG2313'), (188, 'AG2146'), (126, 'AG2311')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 15908 of IO tensor {'CrossPassTensor': ''}bfloat16 %input52|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(49, 'AG2320'), (188, 'AG2146'), (128, 'AG2318')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 16159 of IO tensor {'CrossPassTensor': ''}bfloat16 %input54|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(51, 'AG2327'), (188, 'AG2146'), (130, 'AG2325')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 16410 of IO tensor {'CrossPassTensor': ''}bfloat16 %input56|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(53, 'AG2334'), (188, 'AG2146'), (132, 'AG2332')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 16661 of IO tensor {'CrossPassTensor': ''}bfloat16 %input58|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(55, 'AG2341'), (188, 'AG2146'), (134, 'AG2339')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 16912 of IO tensor {'CrossPassTensor': ''}bfloat16 %input60|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(57, 'AG2348'), (188, 'AG2146'), (136, 'AG2346')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 17163 of IO tensor {'CrossPassTensor': ''}bfloat16 %input62|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(59, 'AG2355'), (188, 'AG2146'), (138, 'AG2353')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 17414 of IO tensor {'CrossPassTensor': ''}bfloat16 %input64|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(61, 'AG2362'), (188, 'AG2146'), (140, 'AG2360')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 17665 of IO tensor {'CrossPassTensor': ''}bfloat16 %input66|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(63, 'AG2369'), (188, 'AG2146'), (142, 'AG2367')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 17916 of IO tensor {'CrossPassTensor': ''}bfloat16 %input68|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(65, 'AG2376'), (188, 'AG2146'), (144, 'AG2374')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 18167 of IO tensor {'CrossPassTensor': ''}bfloat16 %input70|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(67, 'AG2383'), (188, 'AG2146'), (146, 'AG2381')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 18418 of IO tensor {'CrossPassTensor': ''}bfloat16 %input72|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(69, 'AG2390'), (188, 'AG2146'), (148, 'AG2388')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 18669 of IO tensor {'CrossPassTensor': ''}bfloat16 %input74|NHWC|(1, 4, 8, 128, 2, 64) is not sorted, index list (w/ AG ids): [(71, 'AG2397'), (188, 'AG2146'), (150, 'AG2395')] +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/AGOrderingAnalysisPass]: AGOrderingAnalysisPass finished after 1.446 seconds +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/StaticTransposeLocalTensor]: Running StaticTransposeLocalTensor +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/StaticTransposeLocalTensor]: Finished (changed=True) +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/StaticTransposeLocalTensor]: StaticTransposeLocalTensor finished after 0.217 seconds +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/PComputeCutting]: Running PComputeCutting +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/PComputeCutting]: Finished (changed=True) +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/PComputeCutting]: PComputeCutting finished after 0.302 seconds +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/BFComputeCutting]: Running BFComputeCutting +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/BFComputeCutting]: Finished (changed=True) +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/BFComputeCutting]: BFComputeCutting finished after 0.064 seconds +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/LoopSplitting]: Running LoopSplitting +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/LoopSplitting]: Finished (changed=False) +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/LoopSplitting]: LoopSplitting finished after 0.013 seconds +2025-08-07T13:51:43Z INFO 47058 [sg0000/Tensorizer/MacroGeneration]: Running MacroGeneration +2025-08-07T13:51:46Z INFO 47058 [sg0000/Tensorizer/MacroGeneration]: Finished (changed=True) +2025-08-07T13:51:46Z INFO 47058 [sg0000/Tensorizer/MacroGeneration]: MacroGeneration finished after 2.335 seconds +2025-08-07T13:51:46Z INFO 47058 [sg0000/Tensorizer/PGTiling]: PGTiling finished after 4.424 seconds +2025-08-07T13:51:46Z INFO 47058 [sg0000/Tensorizer/InsertIOTransposes]: Running InsertIOTransposes +2025-08-07T13:51:47Z INFO 47058 [sg0000/Tensorizer/InsertIOTransposes]: Finished (changed=True) +2025-08-07T13:51:47Z INFO 47058 [sg0000/Tensorizer/InsertIOTransposes]: InsertIOTransposes finished after 1.162 seconds +2025-08-07T13:51:47Z INFO 47058 [sg0000/Tensorizer/InsertOffloadedTransposes]: Running InsertOffloadedTransposes +2025-08-07T13:51:47Z INFO 47058 [sg0000/Tensorizer/InsertOffloadedTransposes]: Finished (changed=False) +2025-08-07T13:51:47Z INFO 47058 [sg0000/Tensorizer/InsertOffloadedTransposes]: InsertOffloadedTransposes finished after 0.094 seconds +2025-08-07T13:51:47Z INFO 47058 [sg0000/Tensorizer/DramToDramTranspose]: Running DramToDramTranspose +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/DramToDramTranspose]: Finished (changed=False) +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/DramToDramTranspose]: DramToDramTranspose finished after 1.068 seconds +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/PGLayoutTilingPipeline]: PGLayoutTilingPipeline finished after 38.887 seconds +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingProfiler]: Running TilingProfiler +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 19008: transpose_128x128 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 19008: matmul_128x128x1 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingProfiler]: Finished (changed=False) +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/TilingProfiler]: TilingProfiler finished after 0.393 seconds +2025-08-07T13:51:48Z INFO 47058 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-08-07T13:51:49Z INFO 47058 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-08-07T13:51:49Z INFO 47058 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.431 seconds +2025-08-07T13:51:49Z INFO 47058 [sg0000/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor +2025-08-07T13:51:51Z INFO 47058 [sg0000/Tensorizer/InferNeuronTensor]: Finished (changed=True) +2025-08-07T13:51:51Z INFO 47058 [sg0000/Tensorizer/InferNeuronTensor]: InferNeuronTensor finished after 1.794 seconds +2025-08-07T13:51:51Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-08-07T13:51:51Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-08-07T13:51:51Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.238 seconds +2025-08-07T13:51:51Z INFO 47058 [sg0000/Tensorizer/LICM]: Running LICM +2025-08-07T13:51:51Z INFO 47058 [sg0000/Tensorizer/LICM]: Finished (changed=True) +2025-08-07T13:51:51Z INFO 47058 [sg0000/Tensorizer/LICM]: LICM finished after 0.096 seconds +2025-08-07T13:51:51Z INFO 47058 [sg0000/Tensorizer/RewriteReplicationMatmul]: Running RewriteReplicationMatmul +2025-08-07T13:51:51Z INFO 47058 [sg0000/Tensorizer/RewriteReplicationMatmul]: Finished (changed=False) +2025-08-07T13:51:51Z INFO 47058 [sg0000/Tensorizer/RewriteReplicationMatmul]: RewriteReplicationMatmul finished after 0.046 seconds +2025-08-07T13:51:51Z INFO 47058 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-08-07T13:51:51Z INFO 47058 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-08-07T13:51:51Z INFO 47058 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.120 seconds +2025-08-07T13:51:51Z INFO 47058 [sg0000/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-08-07T13:51:51Z INFO 47058 [sg0000/Tensorizer/SimplifyMacroPredicates]: Finished (changed=True) +2025-08-07T13:51:51Z INFO 47058 [sg0000/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.128 seconds +2025-08-07T13:51:51Z INFO 47058 [sg0000/Tensorizer/DataLocalityOpt]: Running DataLocalityOpt +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/DataLocalityOpt]: Finished (changed=True) +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/DataLocalityOpt]: DataLocalityOpt finished after 1.911 seconds +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/DMATilingProfiler]: Running DMATilingProfiler +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 19008: transpose_128x128 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 19008: matmul_128x128x1 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1536: matmul_128x128x1 +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/DMATilingProfiler]: Finished (changed=False) +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/DMATilingProfiler]: DMATilingProfiler finished after 0.071 seconds +2025-08-07T13:51:53Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.289 seconds +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: Running LegalizeSundaMacro +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12262 | hlo_id: 12262 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12247 | hlo_id: 12247 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12290 | hlo_id: 12290 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12275 | hlo_id: 12275 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12318 | hlo_id: 12318 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12303 | hlo_id: 12303 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12346 | hlo_id: 12346 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12331 | hlo_id: 12331 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12374 | hlo_id: 12374 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12359 | hlo_id: 12359 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12402 | hlo_id: 12402 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12387 | hlo_id: 12387 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12430 | hlo_id: 12430 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12415 | hlo_id: 12415 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12458 | hlo_id: 12458 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12443 | hlo_id: 12443 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12486 | hlo_id: 12486 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12471 | hlo_id: 12471 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12514 | hlo_id: 12514 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12499 | hlo_id: 12499 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12542 | hlo_id: 12542 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12527 | hlo_id: 12527 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12570 | hlo_id: 12570 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12555 | hlo_id: 12555 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12598 | hlo_id: 12598 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12583 | hlo_id: 12583 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12626 | hlo_id: 12626 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12611 | hlo_id: 12611 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12654 | hlo_id: 12654 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12639 | hlo_id: 12639 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12682 | hlo_id: 12682 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12667 | hlo_id: 12667 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12710 | hlo_id: 12710 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12695 | hlo_id: 12695 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12738 | hlo_id: 12738 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12723 | hlo_id: 12723 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12766 | hlo_id: 12766 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12751 | hlo_id: 12751 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12794 | hlo_id: 12794 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12779 | hlo_id: 12779 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12822 | hlo_id: 12822 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12807 | hlo_id: 12807 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12850 | hlo_id: 12850 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12835 | hlo_id: 12835 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12878 | hlo_id: 12878 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12863 | hlo_id: 12863 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12906 | hlo_id: 12906 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12891 | hlo_id: 12891 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12934 | hlo_id: 12934 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12919 | hlo_id: 12919 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12962 | hlo_id: 12962 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12947 | hlo_id: 12947 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12990 | hlo_id: 12990 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.12975 | hlo_id: 12975 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.13018 | hlo_id: 13018 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.13003 | hlo_id: 13003 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.13046 | hlo_id: 13046 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.13031 | hlo_id: 13031 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.13074 | hlo_id: 13074 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.13059 | hlo_id: 13059 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.13102 | hlo_id: 13102 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.13087 | hlo_id: 13087 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.13130 | hlo_id: 13130 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.13115 | hlo_id: 13115 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.13158 | hlo_id: 13158 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.13143 | hlo_id: 13143 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.13186 | hlo_id: 13186 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.13171 | hlo_id: 13171 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.13214 | hlo_id: 13214 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.13199 | hlo_id: 13199 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.13242 | hlo_id: 13242 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: unsupported partition shape for offset dge in tensor_op_name: _scatter.13227 | hlo_id: 13227 | +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: Finished (changed=True) +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaMacro]: LegalizeSundaMacro finished after 0.378 seconds +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.302 seconds +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.063 seconds +2025-08-07T13:51:54Z INFO 47058 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-08-07T13:51:55Z INFO 47058 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-08-07T13:51:55Z INFO 47058 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.179 seconds +2025-08-07T13:51:55Z INFO 47058 [sg0000/Tensorizer/RewriteWeights]: Running RewriteWeights +2025-08-07T13:51:55Z INFO 47058 [sg0000/Tensorizer/RewriteWeights]: Finished (changed=True) +2025-08-07T13:51:55Z INFO 47058 [sg0000/Tensorizer/RewriteWeights]: RewriteWeights finished after 0.058 seconds +2025-08-07T13:51:55Z INFO 47058 [sg0000/Tensorizer/ReshapeWeights]: Running ReshapeWeights +2025-08-07T13:51:55Z INFO 47058 [sg0000/Tensorizer/ReshapeWeights]: Finished (changed=True) +2025-08-07T13:51:55Z INFO 47058 [sg0000/Tensorizer/ReshapeWeights]: ReshapeWeights finished after 0.022 seconds +2025-08-07T13:51:55Z INFO 47058 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-08-07T13:51:55Z INFO 47058 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=False) +2025-08-07T13:51:55Z INFO 47058 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.268 seconds +2025-08-07T13:51:55Z INFO 47058 [sg0000/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-08-07T13:51:55Z INFO 47058 [sg0000/Tensorizer/SimplifyMacroPredicates]: Finished (changed=True) +2025-08-07T13:51:55Z INFO 47058 [sg0000/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.188 seconds +2025-08-07T13:51:55Z INFO 47058 [sg0000/Tensorizer/InferInitValue]: Running InferInitValue +2025-08-07T13:51:56Z INFO 47058 [sg0000/Tensorizer/InferInitValue]: Finished (changed=True) +2025-08-07T13:51:56Z INFO 47058 [sg0000/Tensorizer/InferInitValue]: InferInitValue finished after 1.029 seconds +2025-08-07T13:51:56Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-08-07T13:51:57Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-08-07T13:51:57Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.301 seconds +2025-08-07T13:51:57Z INFO 47058 [sg0000/Tensorizer/SimplifyTensor]: Running SimplifyTensor +2025-08-07T13:51:57Z INFO 47058 [sg0000/Tensorizer/SimplifyTensor]: Finished (changed=True) +2025-08-07T13:51:57Z INFO 47058 [sg0000/Tensorizer/SimplifyTensor]: SimplifyTensor finished after 0.214 seconds +2025-08-07T13:51:57Z INFO 47058 [sg0000/Tensorizer/LICM]: Running LICM +2025-08-07T13:51:57Z INFO 47058 [sg0000/Tensorizer/LICM]: Finished (changed=True) +2025-08-07T13:51:57Z INFO 47058 [sg0000/Tensorizer/LICM]: LICM finished after 0.106 seconds +2025-08-07T13:51:57Z INFO 47058 [sg0000/Tensorizer/SundaISel]: Running SundaISel +2025-08-07T13:51:58Z INFO 47058 [sg0000/Tensorizer/SundaISel]: Finished (changed=True) +2025-08-07T13:51:58Z INFO 47058 [sg0000/Tensorizer/SundaISel]: SundaISel finished after 1.630 seconds +2025-08-07T13:51:58Z INFO 47058 [sg0000/Tensorizer/NeuronAliasDependencyReset]: Running NeuronAliasDependencyReset +2025-08-07T13:51:58Z INFO 47058 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-08-07T13:51:58Z INFO 47058 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=True) +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.003 seconds +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: Running NeuronAliasDependencyInduction +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: Finished (changed=True) +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: NeuronAliasDependencyInduction finished after 0.026 seconds +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/NeuronAliasDependencyReset]: NeuronAliasDependencyReset finished after 0.043 seconds +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/LowerComplexBroadcast]: Running LowerComplexBroadcast +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/LowerComplexBroadcast]: Finished (changed=True) +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/LowerComplexBroadcast]: LowerComplexBroadcast finished after 0.156 seconds +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.046 seconds +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.054 seconds +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/NeuronLoopFusion]: Finished (changed=True) +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion finished after 0.409 seconds +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.045 seconds +2025-08-07T13:51:59Z INFO 47058 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-08-07T13:52:00Z INFO 47058 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-08-07T13:52:00Z INFO 47058 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.232 seconds +2025-08-07T13:52:00Z INFO 47058 [sg0000/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-08-07T13:52:00Z INFO 47058 [sg0000/Tensorizer/FactorizeBlkDims]: Finished (changed=False) +2025-08-07T13:52:00Z INFO 47058 [sg0000/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.249 seconds +2025-08-07T13:52:00Z INFO 47058 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-08-07T13:52:01Z INFO 47058 [sg0000/Tensorizer/NeuronInstComb]: Finished (changed=True) +2025-08-07T13:52:01Z INFO 47058 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 1.416 seconds +2025-08-07T13:52:01Z INFO 47058 [sg0000/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-08-07T13:52:01Z INFO 47058 [sg0000/Tensorizer/NeuronValueNumbering]: Finished (changed=True) +2025-08-07T13:52:01Z INFO 47058 [sg0000/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.106 seconds +2025-08-07T13:52:01Z INFO 47058 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.193 seconds +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/VectorizeDMA]: Running VectorizeDMA +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/VectorizeDMA]: Finished (changed=False) +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/VectorizeDMA]: VectorizeDMA finished after 0.034 seconds +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.042 seconds +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/LegalizePartitionReduce]: Running LegalizePartitionReduce +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/LegalizePartitionReduce]: Finished (changed=False) +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/LegalizePartitionReduce]: LegalizePartitionReduce finished after 0.035 seconds +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/DeConcat]: Running DeConcat +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/DeConcat]: Finished (changed=False) +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/DeConcat]: DeConcat finished after 0.012 seconds +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: Running FactorizeThreadAxesInFreeDims +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: Finished (changed=False) +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: FactorizeThreadAxesInFreeDims finished after 0.036 seconds +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/PartialSimdFusion]: Finished (changed=False) +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/PartialSimdFusion]: PartialSimdFusion finished after 0.207 seconds +2025-08-07T13:52:02Z INFO 47058 [sg0000/Tensorizer/TritiumFusion]: Running TritiumFusion +2025-08-07T13:52:03Z INFO 47058 [sg0000/Tensorizer/TritiumFusion]: Finished (changed=True) +2025-08-07T13:52:03Z INFO 47058 [sg0000/Tensorizer/TritiumFusion]: TritiumFusion finished after 1.090 seconds +2025-08-07T13:52:03Z INFO 47058 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-08-07T13:52:03Z INFO 47058 [sg0000/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-08-07T13:52:03Z INFO 47058 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.263 seconds +2025-08-07T13:52:03Z INFO 47058 [sg0000/Tensorizer/VectorizeMatMult]: Running VectorizeMatMult +2025-08-07T13:52:03Z INFO 47058 [sg0000/Tensorizer/VectorizeMatMult]: Finished (changed=False) +2025-08-07T13:52:03Z INFO 47058 [sg0000/Tensorizer/VectorizeMatMult]: VectorizeMatMult finished after 0.021 seconds +2025-08-07T13:52:03Z INFO 47058 [sg0000/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion +2025-08-07T13:52:04Z INFO 47058 [sg0000/Tensorizer/PartialLoopFusion]: Finished (changed=True) +2025-08-07T13:52:04Z INFO 47058 [sg0000/Tensorizer/PartialLoopFusion]: PartialLoopFusion finished after 0.237 seconds +2025-08-07T13:52:04Z INFO 47058 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-08-07T13:52:04Z INFO 47058 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-08-07T13:52:04Z INFO 47058 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.203 seconds +2025-08-07T13:52:04Z INFO 47058 [sg0000/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-08-07T13:52:04Z INFO 47058 [sg0000/Tensorizer/LowerTranspose]: Finished (changed=True) +2025-08-07T13:52:04Z INFO 47058 [sg0000/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.399 seconds +2025-08-07T13:52:04Z INFO 47058 [sg0000/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-08-07T13:52:04Z INFO 47058 [sg0000/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-08-07T13:52:04Z INFO 47058 [sg0000/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.047 seconds +2025-08-07T13:52:04Z INFO 47058 [sg0000/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-08-07T13:52:05Z INFO 47058 [sg0000/Tensorizer/LateNeuronInstComb]: Finished (changed=True) +2025-08-07T13:52:05Z INFO 47058 [sg0000/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.451 seconds +2025-08-07T13:52:05Z INFO 47058 [sg0000/Tensorizer/SplitAccGrp]: Running SplitAccGrp +2025-08-07T13:52:05Z INFO 47058 [sg0000/Tensorizer/SplitAccGrp]: Finished (changed=False) +2025-08-07T13:52:05Z INFO 47058 [sg0000/Tensorizer/SplitAccGrp]: SplitAccGrp finished after 0.038 seconds +2025-08-07T13:52:05Z INFO 47058 [sg0000/Tensorizer/SpillPSum]: Running SpillPSum +2025-08-07T13:52:05Z INFO 47058 [sg0000/Tensorizer/SpillPSum]: Finished (changed=True) +2025-08-07T13:52:05Z INFO 47058 [sg0000/Tensorizer/SpillPSum]: SpillPSum finished after 0.543 seconds +2025-08-07T13:52:05Z INFO 47058 [sg0000/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-08-07T13:52:07Z INFO 47058 [sg0000/Tensorizer/LowerIntrinsics]: Finished (changed=True) +2025-08-07T13:52:07Z INFO 47058 [sg0000/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 1.229 seconds +2025-08-07T13:52:07Z INFO 47058 [sg0000/Tensorizer/InlineNativeKernels]: Running InlineNativeKernels +2025-08-07T13:52:07Z INFO 47058 [sg0000/Tensorizer/InlineNativeKernels]: Finished (changed=False) +2025-08-07T13:52:07Z INFO 47058 [sg0000/Tensorizer/InlineNativeKernels]: InlineNativeKernels finished after 0.054 seconds +2025-08-07T13:52:07Z INFO 47058 [sg0000/Tensorizer/LegalizeType]: Running LegalizeType +2025-08-07T13:52:07Z INFO 47058 [sg0000/Tensorizer/LegalizeType]: Finished (changed=True) +2025-08-07T13:52:07Z INFO 47058 [sg0000/Tensorizer/LegalizeType]: LegalizeType finished after 0.208 seconds +2025-08-07T13:52:07Z INFO 47058 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-08-07T13:52:07Z INFO 47058 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-08-07T13:52:07Z INFO 47058 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.289 seconds +2025-08-07T13:52:07Z INFO 47058 [sg0000/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-08-07T13:52:08Z INFO 47058 [sg0000/Tensorizer/InferPSumTensor]: Finished (changed=True) +2025-08-07T13:52:08Z INFO 47058 [sg0000/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.978 seconds +2025-08-07T13:52:08Z INFO 47058 [sg0000/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-08-07T13:52:08Z INFO 47058 [sg0000/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-08-07T13:52:08Z INFO 47058 [sg0000/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.055 seconds +2025-08-07T13:52:08Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-08-07T13:52:10Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaAccess]: Finished (changed=False) +2025-08-07T13:52:10Z INFO 47058 [sg0000/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 1.454 seconds +2025-08-07T13:52:10Z INFO 47058 [sg0000/Tensorizer/RelaxPredicates]: Running RelaxPredicates +2025-08-07T13:52:10Z INFO 47058 [sg0000/Tensorizer/RelaxPredicates]: Finished (changed=False) +2025-08-07T13:52:10Z INFO 47058 [sg0000/Tensorizer/RelaxPredicates]: RelaxPredicates finished after 0.155 seconds +2025-08-07T13:52:10Z INFO 47058 [sg0000/Tensorizer/TensorInitialization]: Running TensorInitialization +2025-08-07T13:52:10Z INFO 47058 [sg0000/Tensorizer/TensorInitialization]: Finished (changed=False) +2025-08-07T13:52:10Z INFO 47058 [sg0000/Tensorizer/TensorInitialization]: TensorInitialization finished after 0.134 seconds +2025-08-07T13:52:10Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-08-07T13:52:10Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-08-07T13:52:10Z INFO 47058 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.179 seconds +2025-08-07T13:52:10Z INFO 47058 [sg0000/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-08-07T13:52:10Z INFO 47058 [sg0000/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-08-07T13:52:10Z INFO 47058 [sg0000/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.090 seconds +2025-08-07T13:52:10Z INFO 47058 [sg0000/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-08-07T13:52:12Z INFO 47058 [sg0000/Tensorizer/SimplifyNeuronTensor]: Finished (changed=True) +2025-08-07T13:52:12Z INFO 47058 [sg0000/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 1.330 seconds +2025-08-07T13:52:12Z INFO 47058 [sg0000/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-08-07T13:52:12Z INFO 47058 [sg0000/Tensorizer/DMALocalityOpt]: Finished (changed=False) +2025-08-07T13:52:12Z INFO 47058 [sg0000/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.035 seconds +2025-08-07T13:52:12Z INFO 47058 [sg0000/Tensorizer/DataStreaming]: Running DataStreaming +2025-08-07T13:52:12Z INFO 47058 [sg0000/Tensorizer/DataStreaming]: Finished (changed=False) +2025-08-07T13:52:12Z INFO 47058 [sg0000/Tensorizer/DataStreaming]: DataStreaming finished after 0.154 seconds +2025-08-07T13:52:12Z INFO 47058 [sg0000/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 3.120 seconds +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/LateLegalizeInst]: Finished (changed=True) +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.227 seconds +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/CoalesceCCOp]: Finished (changed=True) +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.191 seconds +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.066 seconds +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 2.705ms (594.000MiB, est bw: 230.258GB/s, 7.724% of tot. time) for bfloat16<128 x 4096> TongaSB partitions[1] bfloat16 (594, 128, 4096) %'30788.46007'[i4422_0,i0.128,i1.4096] = load bfloat16<128 x 4096> {'CrossPassTensor': '', 'transposable': True}bfloat16 (75968, 4096) %'input473'[128i4422_0+i0.128,i1.4096] # id=46006, src_id=None, , instances=594 # dl = tensor_op_name: input473_pftranspose_30788 | hlo_id: 18069 | if -128i4422_0-i0.128+75967 >= 0 [[i0.128];[i1.4096]] -> [[i0.128];[i1.4096]] +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 0.661% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[2] bfloat16 (32, 2, 128, 3072) %'input84_local_33013'[i148_0,i147_0_0_33017,i0.128,i1.3072] = load bfloat16<128 x 3072> {'CrossPassTensor': '', 'transposable': True, 'static_io_transpose': {'reshape': (32, 128, 2, 24, 128), 'transpose': [0, 2, 4, 3, 1]}}bfloat16 (32, 2, 128, 3072) %'input84'[i148_0,i147_0_0_33017,i0.128,i1.3072] # id=37070, src_id=None, , instances=64 # dl = tensor_op_name: _dot.407 | hlo_id: 14041 | [[i0.128];[i1.3072]] -> [[i0.128];[i1.3072]] +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 0.661% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[2] bfloat16 (32, 2, 128, 3072) %'input95_local_33088'[i270_0,i269_0_0_33092,i0.128,i1.3072] = load bfloat16<128 x 3072> {'CrossPassTensor': '', 'transposable': True, 'static_io_transpose': {'reshape': (32, 128, 2, 24, 128), 'transpose': [0, 2, 4, 3, 1]}}bfloat16 (32, 2, 128, 3072) %'input95'[i270_0,i269_0_0_33092,i0.128,i1.3072] # id=37244, src_id=None, , instances=64 # dl = tensor_op_name: _dot.739 | hlo_id: 14156 | [[i0.128];[i1.3072]] -> [[i0.128];[i1.3072]] +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 0.661% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[2] bfloat16 (32, 2, 128, 3072) %'input106_local_33163'[i392_0,i391_0_0_33167,i0.128,i1.3072] = load bfloat16<128 x 3072> {'CrossPassTensor': '', 'transposable': True, 'static_io_transpose': {'reshape': (32, 128, 2, 24, 128), 'transpose': [0, 2, 4, 3, 1]}}bfloat16 (32, 2, 128, 3072) %'input106'[i392_0,i391_0_0_33167,i0.128,i1.3072] # id=37418, src_id=None, , instances=64 # dl = tensor_op_name: _dot.1071 | hlo_id: 14271 | [[i0.128];[i1.3072]] -> [[i0.128];[i1.3072]] +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 0.661% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[2] bfloat16 (32, 2, 128, 3072) %'input117_local_33238'[i514_0,i513_0_0_33242,i0.128,i1.3072] = load bfloat16<128 x 3072> {'CrossPassTensor': '', 'transposable': True, 'static_io_transpose': {'reshape': (32, 128, 2, 24, 128), 'transpose': [0, 2, 4, 3, 1]}}bfloat16 (32, 2, 128, 3072) %'input117'[i514_0,i513_0_0_33242,i0.128,i1.3072] # id=37592, src_id=None, , instances=64 # dl = tensor_op_name: _dot.1403 | hlo_id: 14386 | [[i0.128];[i1.3072]] -> [[i0.128];[i1.3072]] +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 0.661% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[2] bfloat16 (32, 2, 128, 3072) %'input128_local_33313'[i636_0,i635_0_0_33317,i0.128,i1.3072] = load bfloat16<128 x 3072> {'CrossPassTensor': '', 'transposable': True, 'static_io_transpose': {'reshape': (32, 128, 2, 24, 128), 'transpose': [0, 2, 4, 3, 1]}}bfloat16 (32, 2, 128, 3072) %'input128'[i636_0,i635_0_0_33317,i0.128,i1.3072] # id=37766, src_id=None, , instances=64 # dl = tensor_op_name: _dot.1735 | hlo_id: 14501 | [[i0.128];[i1.3072]] -> [[i0.128];[i1.3072]] +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 0.661% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[2] bfloat16 (32, 2, 128, 3072) %'input139_local_33388'[i758_0,i757_0_0_33392,i0.128,i1.3072] = load bfloat16<128 x 3072> {'CrossPassTensor': '', 'transposable': True, 'static_io_transpose': {'reshape': (32, 128, 2, 24, 128), 'transpose': [0, 2, 4, 3, 1]}}bfloat16 (32, 2, 128, 3072) %'input139'[i758_0,i757_0_0_33392,i0.128,i1.3072] # id=37940, src_id=None, , instances=64 # dl = tensor_op_name: _dot.2067 | hlo_id: 14616 | [[i0.128];[i1.3072]] -> [[i0.128];[i1.3072]] +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 0.661% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[2] bfloat16 (32, 2, 128, 3072) %'input150_local_33463'[i880_0,i879_0_0_33467,i0.128,i1.3072] = load bfloat16<128 x 3072> {'CrossPassTensor': '', 'transposable': True, 'static_io_transpose': {'reshape': (32, 128, 2, 24, 128), 'transpose': [0, 2, 4, 3, 1]}}bfloat16 (32, 2, 128, 3072) %'input150'[i880_0,i879_0_0_33467,i0.128,i1.3072] # id=38114, src_id=None, , instances=64 # dl = tensor_op_name: _dot.2399 | hlo_id: 14731 | [[i0.128];[i1.3072]] -> [[i0.128];[i1.3072]] +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 0.661% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[2] bfloat16 (32, 2, 128, 3072) %'input161_local_33538'[i1002_0,i1001_0_0_33542,i0.128,i1.3072] = load bfloat16<128 x 3072> {'CrossPassTensor': '', 'transposable': True, 'static_io_transpose': {'reshape': (32, 128, 2, 24, 128), 'transpose': [0, 2, 4, 3, 1]}}bfloat16 (32, 2, 128, 3072) %'input161'[i1002_0,i1001_0_0_33542,i0.128,i1.3072] # id=38288, src_id=None, , instances=64 # dl = tensor_op_name: _dot.2731 | hlo_id: 14846 | [[i0.128];[i1.3072]] -> [[i0.128];[i1.3072]] +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 231.410us (48.000MiB, est bw: 217.500GB/s, 0.661% of tot. time) for bfloat16<128 x 3072> TongaSB partitions[2] bfloat16 (32, 2, 128, 3072) %'input172_local_33613'[i1124_0,i1123_0_0_33617,i0.128,i1.3072] = load bfloat16<128 x 3072> {'CrossPassTensor': '', 'transposable': True, 'static_io_transpose': {'reshape': (32, 128, 2, 24, 128), 'transpose': [0, 2, 4, 3, 1]}}bfloat16 (32, 2, 128, 3072) %'input172'[i1124_0,i1123_0_0_33617,i0.128,i1.3072] # id=38462, src_id=None, , instances=64 # dl = tensor_op_name: _dot.3063 | hlo_id: 14961 | [[i0.128];[i1.3072]] -> [[i0.128];[i1.3072]] +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.088 seconds +2025-08-07T13:52:15Z INFO 47058 [sg0000/Tensorizer/OptimizeNKIKernels]: Running OptimizeNKIKernels +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DoNothing]: Running DoNothing +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DoNothing]: Finished (changed=True) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/FactorizeBlkDims]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.001 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LowerTranspose]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LateNeuronInstComb]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.001 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SpillPSum]: Running SpillPSum +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SpillPSum]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SpillPSum]: SpillPSum finished after 0.003 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LowerIntrinsics]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LegalizeType]: Running LegalizeType +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LegalizeType]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LegalizeType]: LegalizeType finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/InferPSumTensor]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.001 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.002 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.003 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.001 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.001 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DMALocalityOpt]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DataStreaming]: Running DataStreaming +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DataStreaming]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DataStreaming]: DataStreaming finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.005 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LateLegalizeInst]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/CoalesceCCOp]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 5.852us (1.000MiB, est bw: 179.191GB/s, 59.288% of tot. time) for float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %13[i0.128,i1.2048] = load float32<128 x 2048> float32 (1, 256) %'x'[i0.128,i1.2048] # id=8, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 4.018us (1.000MiB, est bw: 260.951GB/s, 40.712% of tot. time) for float32<128 x 2048> float32 (1, 256) %'y'[i0.128,i1.2048] = store float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %11[i0.128,i1.2048] # id=10, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.001 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DoNothing]: Running DoNothing +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DoNothing]: Finished (changed=True) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/FactorizeBlkDims]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LowerTranspose]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.001 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LateNeuronInstComb]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SpillPSum]: Running SpillPSum +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SpillPSum]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SpillPSum]: SpillPSum finished after 0.001 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LowerIntrinsics]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LegalizeType]: Running LegalizeType +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LegalizeType]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LegalizeType]: LegalizeType finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.001 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/InferPSumTensor]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.001 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.002 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.003 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.001 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DMALocalityOpt]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DataStreaming]: Running DataStreaming +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DataStreaming]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DataStreaming]: DataStreaming finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.003 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LateLegalizeInst]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/CoalesceCCOp]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.001 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.000 seconds +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 5.852us (1.000MiB, est bw: 179.191GB/s, 59.288% of tot. time) for float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %13[i0.128,i1.2048] = load float32<128 x 2048> float32 (1, 256) %'x'[i0.128,i1.2048] # id=8, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 4.018us (1.000MiB, est bw: 260.951GB/s, 40.712% of tot. time) for float32<128 x 2048> float32 (1, 256) %'y'[i0.128,i1.2048] = store float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %11[i0.128,i1.2048] # id=10, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [cumsum/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.001 seconds +2025-08-07T13:52:16Z INFO 47058 [sg0000/Tensorizer/OptimizeNKIKernels]: Finished (changed=True) +2025-08-07T13:52:16Z INFO 47058 [sg0000/Tensorizer/OptimizeNKIKernels]: OptimizeNKIKernels finished after 0.461 seconds +2025-08-07T13:52:16Z INFO 47058 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-08-07T13:52:16Z INFO 47058 [sg0000/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-08-07T13:52:16Z INFO 47058 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.411 seconds +2025-08-07T13:52:16Z INFO 47058 [sg0000/Tensorizer/StaticProfiler]: Running StaticProfiler +2025-08-07T13:52:16Z WARNING 47058 [sg0000/Tensorizer/StaticProfiler]: matmul-based transposes inserted by penguin takes up 91.34 percent of all matmul computation +2025-08-07T13:52:16Z INFO 47058 [sg0000/Tensorizer/StaticProfiler]: Finished (changed=False) +2025-08-07T13:52:16Z INFO 47058 [sg0000/Tensorizer/StaticProfiler]: StaticProfiler finished after 0.133 seconds +2025-08-07T13:52:16Z INFO 47058 [sg0000/Tensorizer/SplitAPUnionSets]: Running SplitAPUnionSets +2025-08-07T13:52:17Z INFO 47058 [sg0000/Tensorizer/SplitAPUnionSets]: Finished (changed=True) +2025-08-07T13:52:17Z INFO 47058 [sg0000/Tensorizer/SplitAPUnionSets]: SplitAPUnionSets finished after 0.331 seconds +2025-08-07T13:52:17Z INFO 47058 [sg0000/Tensorizer/LateLegalizePostSplit]: Running LateLegalizePostSplit +2025-08-07T13:52:17Z INFO 47058 [sg0000/Tensorizer/LateLegalizePostSplit]: Finished (changed=False) +2025-08-07T13:52:17Z INFO 47058 [sg0000/Tensorizer/LateLegalizePostSplit]: LateLegalizePostSplit finished after 0.092 seconds +2025-08-07T13:52:17Z INFO 47058 [sg0000/Tensorizer/DumpGraphAndMetadata]: Running DumpGraphAndMetadata +2025-08-07T13:52:17Z INFO 47058 [sg0000/Tensorizer/DumpGraphAndMetadata]: Finished (changed=False) +2025-08-07T13:52:17Z INFO 47058 [sg0000/Tensorizer/DumpGraphAndMetadata]: DumpGraphAndMetadata finished after 0.241 seconds +2025-08-07T13:52:17Z INFO 47058 [sg0000/Tensorizer/ZeroSizeTensorElimination]: Running ZeroSizeTensorElimination +2025-08-07T13:52:17Z INFO 47058 [sg0000/Tensorizer/ZeroSizeTensorElimination]: Finished (changed=False) +2025-08-07T13:52:17Z INFO 47058 [sg0000/Tensorizer/ZeroSizeTensorElimination]: ZeroSizeTensorElimination finished after 0.000 seconds +2025-08-07T13:52:17Z INFO 47058 [sg0000/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop +2025-08-07T13:52:20Z INFO 47058 [sg0000/Tensorizer/BirCodeGenLoop]: Finished (changed=False) +2025-08-07T13:52:20Z INFO 47058 [sg0000/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 2.421 seconds +2025-08-07T13:52:21Z INFO 47058 [Tensorizer]: BirCodeGen estimate #instances=322914 in sg0000 +2025-08-07T13:52:21Z INFO 47058 [Tensorizer]: IR signature: 0008d2291c40cbc1943864bc884fa81d07373338c5a8542a06f116c7446c8305 for nc00/sg0000/TensorizerBIR +2025-08-07T13:52:21Z INFO 47058 [Tensorizer]: Weights total number of bytes: 4854280 +2025-08-07T13:52:21Z INFO 47058 [Tensorizer]: Successfully built model. +2025-08-07T13:52:21Z USER 47058 [root/Tensorizer/Tensorizer]: Tensorizer finished after 114.786 seconds +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: End tensorization +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input0 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input1 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input2 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input3 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input4 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input5 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input6 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input7 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input8 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input9 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input10 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input11 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input12 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input13 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input14 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input15 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input16 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input17 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input18 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input19 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input20 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input21 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input22 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input23 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input24 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input25 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input26 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input27 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input28 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input29 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input30 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input31 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input32 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input33 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input34 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input35 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input36 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input37 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input38 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input39 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input40 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input41 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input42 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input43 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input44 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input45 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input46 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input47 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input48 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input49 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input50 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input51 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input52 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input53 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input54 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input55 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input56 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input57 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input58 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input59 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input60 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input61 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input62 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input63 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input64 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input65 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input66 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input67 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input68 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input69 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input70 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input71 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input72 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input73 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input74 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input75 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input76 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input77 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input78 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input79 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input80 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input81 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input82 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input83 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input84 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input85 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input86 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input87 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input88 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input89 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input90 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input91 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input92 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input93 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input94 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input95 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input96 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input97 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input98 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input99 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input100 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input101 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input102 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input103 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input104 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input105 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input106 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input107 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input108 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input109 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input110 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input111 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input112 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input113 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input114 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input115 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input116 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input117 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input118 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input119 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input120 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input121 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input122 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input123 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input124 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input125 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input126 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input127 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input128 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input129 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input130 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input131 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input132 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input133 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input134 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input135 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input136 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input137 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input138 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input139 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input140 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input141 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input142 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input143 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input144 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input145 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input146 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input147 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input148 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input149 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input150 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input151 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input152 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input153 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input154 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input155 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input156 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input157 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input158 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input159 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input160 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input161 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input162 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input163 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input164 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input165 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input166 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input167 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input168 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input169 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input170 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input171 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input172 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input173 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input174 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input175 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input176 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input177 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input178 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input179 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input180 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input181 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input182 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input183 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input184 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input185 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input186 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input187 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input188 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input189 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input190 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input191 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input192 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input193 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input194 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input195 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input196 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input197 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input198 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input199 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input200 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input201 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input202 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input203 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input204 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input205 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input206 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input207 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input208 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input209 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input210 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input211 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input212 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input213 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input214 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input215 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input216 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input217 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input218 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input219 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input220 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input221 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input222 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input223 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input224 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input225 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input226 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input227 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input228 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input229 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input230 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input231 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input232 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input233 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input234 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input235 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input236 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input237 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input238 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input239 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input240 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input241 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input242 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input243 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input244 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input245 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input246 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input247 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input248 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input249 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input250 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input251 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input252 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input253 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input254 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input255 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input256 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input257 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input258 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input259 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input260 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input261 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input262 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input263 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input264 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input265 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input266 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input267 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input268 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input269 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input270 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input271 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input272 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input273 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input274 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input275 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input276 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input277 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input278 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input279 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input280 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input281 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input282 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input283 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input284 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input285 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input286 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input287 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input288 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input289 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input290 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input291 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input292 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input293 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input294 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input295 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input296 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input297 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input298 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input299 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input300 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input301 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input302 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input303 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input304 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input305 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input306 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input307 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input308 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input309 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input310 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input311 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input312 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input313 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input314 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input315 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input316 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input317 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input318 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input319 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input320 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input321 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input322 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input323 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input324 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input325 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input326 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input327 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input328 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input329 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input330 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input331 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input332 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input333 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input334 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input335 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input336 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input337 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input338 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input339 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input340 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input341 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input342 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input343 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input344 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input345 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input346 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input347 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input348 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input349 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input350 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input351 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input352 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input353 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input354 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input355 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input356 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input357 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input358 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input359 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input360 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input361 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input362 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input363 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input364 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input365 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input366 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input367 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input368 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input369 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input370 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input371 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input372 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input373 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input374 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input375 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input376 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input377 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input378 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input379 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input380 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input381 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input382 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input383 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input384 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input385 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input386 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input387 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input388 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input389 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input390 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input391 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input392 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input393 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input394 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input395 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input396 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input397 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input398 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input399 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input400 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input401 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input402 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input403 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input404 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input405 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input406 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input407 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input408 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input409 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input410 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input411 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input412 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input413 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input414 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input415 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input416 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input417 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input418 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input419 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input420 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input421 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input422 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input423 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input424 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input425 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input426 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input427 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input428 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input429 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input430 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input431 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input432 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input433 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input434 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input435 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input436 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input437 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input438 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input439 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input440 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input441 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input442 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input443 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input444 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input445 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input446 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input447 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input448 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input449 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input450 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input451 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input452 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input453 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input454 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input455 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input456 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input457 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input458 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input459 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input460 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input461 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input462 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input463 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input464 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input465 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input466 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input467 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input468 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input469 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input470 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input471 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input472 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input473 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Network input: input474 +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: wrote bir.json +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: wrote tensor_map.json +2025-08-07T13:52:21Z INFO 47058 [job.Frontend.0]: Job #0 finished +2025-08-07T13:52:21Z INFO 47058 [pipeline.Pipeline.0]: Finished job job.Frontend.0 +2025-08-07T13:52:21Z INFO 47058 [pipeline.Pipeline.0]: Starting job job.StaticIOTranspose.0 +2025-08-07T13:52:21Z INFO 47058 [pipeline.Pipeline.0]: Finished job job.StaticIOTranspose.0 +2025-08-07T13:52:21Z INFO 47058 [pipeline.Pipeline.0]: Starting job job.WalrusDriver.0 +2025-08-07T13:52:21Z INFO 47058 [job.WalrusDriver.0]: BackendDriver has 1 states with 1 core LNC +2025-08-07T13:52:21Z INFO 47058 [job.WalrusDriver.0]: BackendDriver: no partitions found. Switching to flat flow. +2025-08-07T13:52:21Z INFO 47058 [job.WalrusDriver.0]: Job WalrusDriver len(in_states) 1 +2025-08-07T13:52:21Z INFO 47058 [job.WalrusDriver.0]: Processing input #0 +2025-08-07T13:52:21Z INFO 47058 [job.WalrusDriver.0]: BackendDriver in_state.num_states 1 with 1 core LNC +2025-08-07T13:52:21Z INFO 47058 [job.WalrusDriver.0]: Executing /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/starfish/bin/walrus_driver --optlevel 2 --allocator coloring --verbose 35 --logfile-verbose 20 --logfile /home/ubuntu/qwen3/token_generation_model/_tp0_bk0/log-neuron-cc.txt --execute-repetition 1 -i bir.json --min_split_size 10240 --skip_split_vns '' --no_split_dram --split_huge_dram_tensor 1.0 --preprocessing_only --max_tensorizer_distance 64 --pack_same_shape_only --instruction_fetch_latency 511 --max-partitions 1 --policy 3 --auxflag 0 --interleave none --schedule-delayed-latency 1 --postsched-mm-accum-reorder=false --max-load-color-rotation --max-load-lower-bound 0.14 --mm-reorder-opt --force-prefetch-follow-incoming-order -1 --allreduce-buffer-size 500 --dram-page-size 512 --dram-rotation-size -1 --allreduce-rotation-dis 8 --repeat-load-thres 4 --enable-mm-transpose-remat-optimization=true --save-len-thres 512 --save-dma-cnt-thres 32 --relaxed-order=true --enable-anti-dependence-reduction=false --num-semaphores-per-queue 16 --numcores 1 --act-root-json /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/pwp/pwp_bin_trainium/act_info.json --dve-root-json /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/dve/dve_bin_gen2/dve_info.json --unified-backend-and-legacy-codegen --tensor-map tensor_map.json --enable-verifier=true --enable-birsim=false --enable-birsim-sync-only=false --enable-data-race-checker=false --enable-new-backend=true --inject-error=NONE --dge-levels vector_dynamic_offsets,scalar_dynamic_offset,io --dynamic-dma-scratch-size-per-partition=16384 --neff-output-filename /home/ubuntu/qwen3/token_generation_model/_tp0_bk0/model.MODULE_6ef5ba8b41fbbe77f080+74ae8282.neff +2025-08-07T13:52:21Z INFO 47058 [job.WalrusDriver.0]: Working directory is /home/ubuntu/qwen3/token_generation_model/_tp0_bk0/neuronxcc-ykq_7n9z/sg00 +2025-08-07T13:52:21Z INFO 47058 [job.WalrusDriver.0]: propagate_exit=True +2025-08-07T13:52:21Z INFO 47058 [job.WalrusDriver.0]: use_logger=False +2025-08-07T13:52:21Z INFO 47058 [job.WalrusDriver.0]: expose_stderr=True +2025-08-07T13:52:22Z INFO 47306 [Logging]: Logging to ../../log-neuron-cc.txt at level 'INFO' +2025-08-07T13:52:22Z INFO 47306 [BackendDriver]: max_allowed_parallelism=128 +2025-08-07T13:52:22Z INFO 47306 [BackendDriver]: Backend driver mtBackend: false numModules: 1 Cwd: "/home/ubuntu/qwen3/token_generation_model/_tp0_bk0/neuronxcc-ykq_7n9z/sg00" +2025-08-07T13:52:22Z INFO 47306 [BackendDriver]: DynamicDMA is enabled +2025-08-07T13:52:22Z INFO 47306 [BackendDriver]: DynamicDMA levels being enabled: io, scalar_dynamic_offset, vector_dynamic_offsets, +2025-08-07T13:52:22Z USER 47306 [BackendPassManager]: Running mod_parallel_pass +2025-08-07T13:52:22Z INFO 47306 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=7295 blocks=1 instructions=7257 Max writers: 191 Max Readers: 475 +2025-08-07T13:52:22Z USER 47306 [ModuleForkPass]: Running do_nothing +2025-08-07T13:52:22Z INFO 47306 [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=7295 blocks=1 instructions=7257 Max writers: 191 Max Readers: 475 +2025-08-07T13:52:22Z USER 47306 [ModuleForkPass]: do_nothing finished after 0.001 seconds +2025-08-07T13:52:22Z INFO 47306 [ModuleForkPass]: curr_vmrss: 211mb, ru_maxrss: 670mb (delta=0mb) +2025-08-07T13:52:22Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 7295 memory location(s), 1 block(s), and 7257 instruction(s). Max writers: 191 Max Readers: 475 +2025-08-07T13:52:22Z USER 47306 [ModuleForkPass]: Running birverifier +2025-08-07T13:52:22Z INFO 47306 [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=7295 blocks=1 instructions=7257 Max writers: 191 Max Readers: 475 +2025-08-07T13:52:22Z WARNING 47306 [birverifier::InstVisitor]: (module) Non - output memory location with no reader: {convert.357.56865}@SB<0,0>(1x2)#Internal DebugInfo: +2025-08-07T13:52:22Z USER 47306 [ModuleForkPass]: birverifier finished after 0.272 seconds +2025-08-07T13:52:22Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1002mb, ru_maxrss: 1002mb (delta=332mb) +2025-08-07T13:52:22Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 7295 memory location(s), 1 block(s), and 7257 instruction(s). Max writers: 191 Max Readers: 475 +2025-08-07T13:52:22Z USER 47306 [BackendPassManager]: mod_parallel_pass finished after 0.278 seconds +2025-08-07T13:52:22Z INFO 47306 [BackendPassManager]: curr_vmrss: 994mb, ru_maxrss: 1002mb (delta=332mb) +2025-08-07T13:52:22Z INFO 47306 [BackendPassManager]: Output has 1 module(s), 1 function(s), 7295 memory location(s), 1 block(s), and 7257 instruction(s). Max writers: 191 Max Readers: 475 +2025-08-07T13:52:22Z USER 47306 [BackendPassManager]: Running subgraph_parallel_pass +2025-08-07T13:52:22Z INFO 47306 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=7295 blocks=1 instructions=7257 Max writers: 191 Max Readers: 475 +2025-08-07T13:52:22Z USER 47306 [SubgraphForkPass]: Running lnc_verifier +2025-08-07T13:52:22Z INFO 47306 [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=7295 blocks=1 instructions=7257 Max writers: 191 Max Readers: 475 +2025-08-07T13:52:22Z USER 47306 [SubgraphForkPass]: lnc_verifier finished after 0.001 seconds +2025-08-07T13:52:22Z INFO 47306 [SubgraphForkPass]: curr_vmrss: 994mb, ru_maxrss: 1002mb (delta=0mb) +2025-08-07T13:52:22Z INFO 47306 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 7295 memory location(s), 1 block(s), and 7257 instruction(s). Max writers: 191 Max Readers: 475 +2025-08-07T13:52:22Z USER 47306 [BackendPassManager]: subgraph_parallel_pass finished after 0.002 seconds +2025-08-07T13:52:22Z INFO 47306 [BackendPassManager]: curr_vmrss: 994mb, ru_maxrss: 1002mb (delta=0mb) +2025-08-07T13:52:22Z INFO 47306 [BackendPassManager]: Output has 1 module(s), 1 function(s), 7295 memory location(s), 1 block(s), and 7257 instruction(s). Max writers: 191 Max Readers: 475 +2025-08-07T13:52:22Z USER 47306 [BackendPassManager]: Running mod_parallel_pass +2025-08-07T13:52:22Z INFO 47306 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=7295 blocks=1 instructions=7257 Max writers: 191 Max Readers: 475 +2025-08-07T13:52:22Z USER 47306 [ModuleForkPass]: Running expand_replication +2025-08-07T13:52:22Z INFO 47306 [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=7295 blocks=1 instructions=7257 Max writers: 191 Max Readers: 475 +2025-08-07T13:52:22Z INFO 47306 [ExpandReplication]: Found 0 replicated matmults +2025-08-07T13:52:22Z USER 47306 [ModuleForkPass]: expand_replication finished after 0.001 seconds +2025-08-07T13:52:22Z INFO 47306 [ModuleForkPass]: curr_vmrss: 994mb, ru_maxrss: 1002mb (delta=0mb) +2025-08-07T13:52:22Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 7295 memory location(s), 1 block(s), and 7257 instruction(s). Max writers: 191 Max Readers: 475 +2025-08-07T13:52:22Z USER 47306 [ModuleForkPass]: Running unroll +2025-08-07T13:52:22Z INFO 47306 [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=7295 blocks=1 instructions=7257 Max writers: 191 Max Readers: 475 +2025-08-07T13:52:22Z INFO 47306 [Unroll]: INFO (Unroll) Start unrolling at Thu Aug 7 13:52:22 2025 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: INFO (Unroll) DONE unrolling Thu Aug 7 13:52:22 2025 + +2025-08-07T13:52:25Z INFO 47306 [Unroll]: sg0000 Instruction count after Unroll: +2025-08-07T13:52:25Z INFO 47306 [Unroll]: Total count: 277097 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: Matmult: 251660 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: GenericCopy: 11565 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: Load: 8259 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: TensorScalarPtr: 1482 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: TensorTensor: 1125 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: Save: 682 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: Activation: 545 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: Memset: 299 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: Max: 224 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: MaxIndex: 224 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: StreamShuffle: 222 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: MatchReplace: 217 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: TensorReduce: 151 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: CollectiveCompute: 75 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: Reciprocal: 75 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: DMACopy: 74 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: Iota: 73 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: StreamTranspose: 72 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: Select: 38 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: Gather: 35 +2025-08-07T13:52:25Z INFO 47306 [Unroll]: Unrolled DGE count with Dynamic AP: 73 +2025-08-07T13:52:25Z USER 47306 [ModuleForkPass]: unroll finished after 2.563 seconds +2025-08-07T13:52:25Z INFO 47306 [ModuleForkPass]: curr_vmrss: 2379mb, ru_maxrss: 2379mb (delta=1377mb) +2025-08-07T13:52:25Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 28306 memory location(s), 1 block(s), and 277097 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:25Z USER 47306 [BackendPassManager]: mod_parallel_pass finished after 2.613 seconds +2025-08-07T13:52:25Z INFO 47306 [BackendPassManager]: curr_vmrss: 1481mb, ru_maxrss: 2379mb (delta=1377mb) +2025-08-07T13:52:25Z INFO 47306 [BackendPassManager]: Output has 1 module(s), 1 function(s), 28306 memory location(s), 1 block(s), and 277097 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:25Z USER 47306 [BackendPassManager]: Running subgraph_parallel_pass +2025-08-07T13:52:25Z INFO 47306 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=28306 blocks=1 instructions=277097 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:25Z USER 47306 [SubgraphForkPass]: Running dead_code_elim +2025-08-07T13:52:25Z INFO 47306 [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=28306 blocks=1 instructions=277097 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:25Z INFO 47306 [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-08-07T13:52:25Z INFO 47306 [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:52:25Z INFO 47306 [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:52:25Z INFO 47306 [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:52:25Z USER 47306 [SubgraphForkPass]: dead_code_elim finished after 0.288 seconds +2025-08-07T13:52:25Z INFO 47306 [SubgraphForkPass]: curr_vmrss: 1489mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:25Z INFO 47306 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 27829 memory location(s), 1 block(s), and 277096 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:25Z USER 47306 [BackendPassManager]: subgraph_parallel_pass finished after 0.291 seconds +2025-08-07T13:52:25Z INFO 47306 [BackendPassManager]: curr_vmrss: 1489mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:25Z INFO 47306 [BackendPassManager]: Output has 1 module(s), 1 function(s), 27829 memory location(s), 1 block(s), and 277096 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:25Z USER 47306 [BackendPassManager]: Running mod_parallel_pass +2025-08-07T13:52:25Z INFO 47306 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=27829 blocks=1 instructions=277096 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:25Z USER 47306 [ModuleForkPass]: Running birverifier +2025-08-07T13:52:25Z INFO 47306 [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=27829 blocks=1 instructions=277096 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:25Z USER 47306 [ModuleForkPass]: birverifier finished after 0.259 seconds +2025-08-07T13:52:25Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1501mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:25Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27829 memory location(s), 1 block(s), and 277096 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:25Z USER 47306 [BackendPassManager]: mod_parallel_pass finished after 0.263 seconds +2025-08-07T13:52:25Z INFO 47306 [BackendPassManager]: curr_vmrss: 1501mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:25Z INFO 47306 [BackendPassManager]: Output has 1 module(s), 1 function(s), 27829 memory location(s), 1 block(s), and 277096 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:25Z USER 47306 [BackendPassManager]: Running subgraph_parallel_pass +2025-08-07T13:52:25Z INFO 47306 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=27829 blocks=1 instructions=277096 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:25Z USER 47306 [SubgraphForkPass]: Running lnc_verifier +2025-08-07T13:52:25Z INFO 47306 [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=27829 blocks=1 instructions=277096 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:25Z USER 47306 [SubgraphForkPass]: lnc_verifier finished after 0.001 seconds +2025-08-07T13:52:25Z INFO 47306 [SubgraphForkPass]: curr_vmrss: 1501mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:25Z INFO 47306 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 27829 memory location(s), 1 block(s), and 277096 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:25Z USER 47306 [BackendPassManager]: subgraph_parallel_pass finished after 0.003 seconds +2025-08-07T13:52:25Z INFO 47306 [BackendPassManager]: curr_vmrss: 1501mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:25Z INFO 47306 [BackendPassManager]: Output has 1 module(s), 1 function(s), 27829 memory location(s), 1 block(s), and 277096 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:25Z USER 47306 [BackendPassManager]: Running mod_parallel_pass +2025-08-07T13:52:25Z INFO 47306 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=27829 blocks=1 instructions=277096 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:25Z USER 47306 [ModuleForkPass]: Running instruction_reorder +2025-08-07T13:52:25Z INFO 47306 [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=27829 blocks=1 instructions=277096 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:25Z USER 47306 [ModuleForkPass]: instruction_reorder finished after 0.045 seconds +2025-08-07T13:52:25Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1501mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:25Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27829 memory location(s), 1 block(s), and 277096 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:25Z USER 47306 [ModuleForkPass]: Running psum_legalization +2025-08-07T13:52:25Z INFO 47306 [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=27829 blocks=1 instructions=277096 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:25Z USER 47306 [ModuleForkPass]: psum_legalization finished after 0.022 seconds +2025-08-07T13:52:25Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1501mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:25Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27829 memory location(s), 1 block(s), and 277096 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:25Z USER 47306 [ModuleForkPass]: Running legalize_cce_dma +2025-08-07T13:52:26Z INFO 47306 [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=27829 blocks=1 instructions=277096 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:26Z USER 47306 [ModuleForkPass]: legalize_cce_dma finished after 0.024 seconds +2025-08-07T13:52:26Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1501mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:26Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27829 memory location(s), 1 block(s), and 277096 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:26Z USER 47306 [ModuleForkPass]: Running error_injector +2025-08-07T13:52:26Z INFO 47306 [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=27829 blocks=1 instructions=277096 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:26Z WARNING 47306 [ErrorInjector]: Unrecognized injected error value "0" +2025-08-07T13:52:26Z USER 47306 [ModuleForkPass]: error_injector finished after 0.001 seconds +2025-08-07T13:52:26Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1501mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:26Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27829 memory location(s), 1 block(s), and 277096 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:26Z USER 47306 [ModuleForkPass]: Running vn_splitter +2025-08-07T13:52:26Z INFO 47306 [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=27829 blocks=1 instructions=277096 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:26Z INFO 47306 [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 14 +2025-08-07T13:52:26Z INFO 47306 [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 +2025-08-07T13:52:26Z INFO 47306 [ShrinkDN]: INFO (ShrinkDN): Shrunk 2 nodes. Total savings 14336 bytes/partition +2025-08-07T13:52:26Z INFO 47306 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 +2025-08-07T13:52:26Z INFO 47306 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 +2025-08-07T13:52:26Z INFO 47306 [VNSplitterPass]: INFO (VNSplitter) Time: 0.001 seconds +2025-08-07T13:52:26Z INFO 47306 [VNSplitterPass]: INFO (VerticalFusion) Time: 0.032 seconds +2025-08-07T13:52:26Z INFO 47306 [VNSplitterPass]: INFO (ShrinkDN) Time: 0.044 seconds +2025-08-07T13:52:26Z USER 47306 [ModuleForkPass]: vn_splitter finished after 0.123 seconds +2025-08-07T13:52:26Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1505mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:26Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27829 memory location(s), 1 block(s), and 277096 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:26Z USER 47306 [ModuleForkPass]: Running constant_propagate +2025-08-07T13:52:26Z INFO 47306 [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=27829 blocks=1 instructions=277096 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:26Z INFO 47306 [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 +2025-08-07T13:52:26Z INFO 47306 [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-08-07T13:52:26Z INFO 47306 [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:52:26Z INFO 47306 [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:52:26Z INFO 47306 [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:52:26Z INFO 47306 [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 +2025-08-07T13:52:26Z INFO 47306 [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-08-07T13:52:26Z INFO 47306 [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:52:26Z INFO 47306 [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:52:26Z INFO 47306 [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:52:26Z USER 47306 [ModuleForkPass]: constant_propagate finished after 0.610 seconds +2025-08-07T13:52:26Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1507mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:26Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27829 memory location(s), 1 block(s), and 277096 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:26Z USER 47306 [ModuleForkPass]: Running lower_ac +2025-08-07T13:52:26Z INFO 47306 [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=27829 blocks=1 instructions=277096 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:26Z INFO 47306 [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. +2025-08-07T13:52:26Z USER 47306 [ModuleForkPass]: lower_ac finished after 0.040 seconds +2025-08-07T13:52:26Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1507mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:26Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27829 memory location(s), 1 block(s), and 277096 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:26Z USER 47306 [ModuleForkPass]: Running input_dma_coalescing +2025-08-07T13:52:26Z INFO 47306 [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=27829 blocks=1 instructions=277096 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:26Z INFO 47306 [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads +2025-08-07T13:52:26Z USER 47306 [ModuleForkPass]: input_dma_coalescing finished after 0.077 seconds +2025-08-07T13:52:26Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1507mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:26Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27829 memory location(s), 1 block(s), and 277096 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:26Z USER 47306 [ModuleForkPass]: Running remat_optimization +2025-08-07T13:52:26Z INFO 47306 [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=27829 blocks=1 instructions=277096 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:27Z INFO 47306 [RematOpt]: Removed 0 remat instructions +2025-08-07T13:52:27Z USER 47306 [ModuleForkPass]: remat_optimization finished after 0.142 seconds +2025-08-07T13:52:27Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1509mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:27Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27829 memory location(s), 1 block(s), and 277096 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:27Z USER 47306 [ModuleForkPass]: Running early_peephole_opts +2025-08-07T13:52:27Z INFO 47306 [ModuleForkPass]: Inputs to early_peephole_opts: modules=1 functions=1 allocs=27829 blocks=1 instructions=277096 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:27Z INFO 47306 [EarlyPeepholeOpts]: PeepholeOpts enabled? ActivationAccumulate: true +2025-08-07T13:52:27Z INFO 47306 [EarlyPeepholeOpts]: Activation Accumulate: 0 +2025-08-07T13:52:27Z USER 47306 [ModuleForkPass]: early_peephole_opts finished after 0.085 seconds +2025-08-07T13:52:27Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1509mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:27Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27829 memory location(s), 1 block(s), and 277096 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:27Z USER 47306 [ModuleForkPass]: Running coalesce_multichannel_cc_ops +2025-08-07T13:52:27Z INFO 47306 [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=27829 blocks=1 instructions=277096 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:27Z USER 47306 [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.021 seconds +2025-08-07T13:52:27Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1509mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:27Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27829 memory location(s), 1 block(s), and 277096 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:27Z USER 47306 [ModuleForkPass]: Running infer_stream_ids +2025-08-07T13:52:27Z INFO 47306 [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=27829 blocks=1 instructions=277096 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:27Z USER 47306 [ModuleForkPass]: infer_stream_ids finished after 0.021 seconds +2025-08-07T13:52:27Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1509mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:27Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27829 memory location(s), 1 block(s), and 277096 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:27Z USER 47306 [ModuleForkPass]: Running pre_sched +2025-08-07T13:52:27Z INFO 47306 [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=27829 blocks=1 instructions=277096 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:27Z INFO 47306 [PreSched]: Start PRE scheduling 2 cores: 1 at: Thu Aug 7 13:52:27 2025 +2025-08-07T13:52:27Z INFO 47306 [LayerSpiller]: LayerSpill: Start... +2025-08-07T13:52:27Z INFO 47306 [LayerSpiller]: LayerSpill: Found 72 Splits CCs +2025-08-07T13:52:27Z INFO 47306 [LayerSpiller]: Grouped CCs to 72 clusters. +2025-08-07T13:52:27Z INFO 47306 [LayerSpiller]: LayerSpill: To Spill 60 multi-layer tensors +2025-08-07T13:52:27Z INFO 47306 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts +2025-08-07T13:52:27Z INFO 47306 [LayerSpiller]: LayerSpill: Done. +2025-08-07T13:52:27Z INFO 47306 [PreSched]: Start split live ranges Thu Aug 7 13:52:27 2025 +2025-08-07T13:52:27Z INFO 47306 [PreSched]: Num_Splits: 0 +2025-08-07T13:52:27Z INFO 47306 [PreSched]: End split live ranges Thu Aug 7 13:52:27 2025 +2025-08-07T13:52:27Z INFO 47306 [PreSched]: Strt remove redundncies Thu Aug 7 13:52:27 2025 +2025-08-07T13:52:27Z INFO 47306 [PreSched]: remove_redundant_memsets +2025-08-07T13:52:27Z INFO 47306 [PreSched]: remove_redundant_memsets: 0 +2025-08-07T13:52:27Z INFO 47306 [PreSched]: remove_redundant_loads +2025-08-07T13:52:27Z INFO 47306 [PreSched]: remove_redundant_loads: 0 +2025-08-07T13:52:27Z INFO 47306 [PreSched]: End remove redundncies Thu Aug 7 13:52:27 2025 +2025-08-07T13:52:27Z INFO 47306 [PreSched]: Start DCE Thu Aug 7 13:52:27 2025 +2025-08-07T13:52:27Z INFO 47306 [PreSched]: eliminateDeadStore removed 0 instructions +2025-08-07T13:52:27Z INFO 47306 [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:52:27Z INFO 47306 [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:52:27Z INFO 47306 [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:52:27Z INFO 47306 [PreSched]: End DCE Thu Aug 7 13:52:27 2025 +2025-08-07T13:52:27Z INFO 47306 [PreSched]: Start build flow dependencies Thu Aug 7 13:52:27 2025 +2025-08-07T13:52:27Z INFO 47306 [build_flow_deps]: Start build fdeps. Invocation: 1Thu Aug 7 13:52:27 2025 +2025-08-07T13:52:27Z INFO 47306 [build_flow_deps]: Allocs: 27949 instructions: 277216 +2025-08-07T13:52:28Z INFO 47306 [build_flow_deps]: Build fdeps inserted 816554 edges +2025-08-07T13:52:28Z INFO 47306 [build_flow_deps]: Done build fdeps 816554 Thu Aug 7 13:52:28 2025 +2025-08-07T13:52:28Z INFO 47306 [PreSched]: End build flow dependencies Thu Aug 7 13:52:28 2025 +2025-08-07T13:52:28Z INFO 47306 [PreSched]: Start remove useless insts Thu Aug 7 13:52:28 2025 +2025-08-07T13:52:28Z INFO 47306 [PreSched]: remove_useless_insts +2025-08-07T13:52:28Z INFO 47306 [PreSched]: remove Useless Instructions: 0 +2025-08-07T13:52:28Z INFO 47306 [PreSched]: End remove useless insts Thu Aug 7 13:52:28 2025 +2025-08-07T13:52:28Z INFO 47306 [PreSched]: Start scratchpad optimization Thu Aug 7 13:52:28 2025 +2025-08-07T13:52:28Z INFO 47306 [PreSched]: End scratchpad optimization Thu Aug 7 13:52:28 2025 +2025-08-07T13:52:28Z INFO 47306 [PreSched]: DONE PRE scheduling Thu Aug 7 13:52:28 2025 +2025-08-07T13:52:28Z USER 47306 [ModuleForkPass]: pre_sched finished after 1.701 seconds +2025-08-07T13:52:28Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1658mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:28Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27949 memory location(s), 1 block(s), and 277216 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:28Z USER 47306 [ModuleForkPass]: Running tensor_copy_elim +2025-08-07T13:52:28Z INFO 47306 [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=27949 blocks=1 instructions=277216 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:28Z INFO 47306 [TensorCopyElim]: Tensor CP elimination: 1 +2025-08-07T13:52:29Z INFO 47306 [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-08-07T13:52:29Z INFO 47306 [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:52:29Z INFO 47306 [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:52:29Z INFO 47306 [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:52:29Z USER 47306 [ModuleForkPass]: tensor_copy_elim finished after 0.380 seconds +2025-08-07T13:52:29Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1658mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:29Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27948 memory location(s), 1 block(s), and 277215 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:29Z USER 47306 [ModuleForkPass]: Running dynamic_dma_setup +2025-08-07T13:52:29Z INFO 47306 [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=27948 blocks=1 instructions=277215 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:29Z USER 47306 [ModuleForkPass]: dynamic_dma_setup finished after 0.001 seconds +2025-08-07T13:52:29Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1658mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:29Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27949 memory location(s), 1 block(s), and 277215 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:29Z USER 47306 [ModuleForkPass]: Running runtime_memory_reservation +2025-08-07T13:52:29Z INFO 47306 [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=27949 blocks=1 instructions=277215 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:29Z USER 47306 [ModuleForkPass]: runtime_memory_reservation finished after 0.001 seconds +2025-08-07T13:52:29Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1658mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:29Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27949 memory location(s), 1 block(s), and 277215 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:29Z USER 47306 [ModuleForkPass]: Running coloring_allocator_psum +2025-08-07T13:52:29Z INFO 47306 [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=27949 blocks=1 instructions=277215 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:29Z INFO 47306 [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:52:29Z INFO 47306 [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:52:29Z INFO 47306 [PSUM_Allocator]: allocating PSUM +2025-08-07T13:52:29Z INFO 47306 [PSUM_Allocator]: main loop +2025-08-07T13:52:29Z INFO 47306 [PSUM_Allocator]: renumber locations +2025-08-07T13:52:29Z INFO 47306 [PSUM_Allocator]: size = 11741 +2025-08-07T13:52:29Z INFO 47306 [PSUM_Allocator]: build_no_bitmap start +2025-08-07T13:52:29Z INFO 47306 [PSUM_Allocator]: 100% PSUM demand before spilling +2025-08-07T13:52:29Z INFO 47306 [PSUM_Allocator]: PSUM high-water mark = 8 tensors +2025-08-07T13:52:29Z INFO 47306 [PSUM_Allocator]: found 22033 edges +2025-08-07T13:52:29Z INFO 47306 [PSUM_Allocator]: mean: 3.75317 +2025-08-07T13:52:29Z INFO 47306 [PSUM_Allocator]: median: 2.28766 +2025-08-07T13:52:29Z INFO 47306 [PSUM_Allocator]: adjacency vectors require 176264 bytes +2025-08-07T13:52:29Z INFO 47306 [PSUM_Allocator]: build_no_bitmap done +2025-08-07T13:52:29Z INFO 47306 [PSUM_Allocator]: find costs +2025-08-07T13:52:34Z INFO 47306 [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false +2025-08-07T13:52:34Z INFO 47306 [PSUM_Allocator]: simplify interference graph +2025-08-07T13:52:34Z INFO 47306 [PSUM_Allocator]: initialize low and high +2025-08-07T13:52:34Z INFO 47306 [PSUM_Allocator]: lo = 11741 +2025-08-07T13:52:34Z INFO 47306 [PSUM_Allocator]: hi = 0 +2025-08-07T13:52:34Z INFO 47306 [PSUM_Allocator]: inf = 0 +2025-08-07T13:52:34Z INFO 47306 [PSUM_Allocator]: total = 11741 +2025-08-07T13:52:34Z INFO 47306 [PSUM_Allocator]: simplify +2025-08-07T13:52:34Z INFO 47306 [PSUM_Allocator]: new candidates = 0 +2025-08-07T13:52:34Z INFO 47306 [PSUM_Allocator]: select ranges +2025-08-07T13:52:34Z INFO 47306 [PSUM_Allocator]: no more spills +2025-08-07T13:52:34Z INFO 47306 [PSUM_Allocator]: PSUM score = 0 (lower is better) +2025-08-07T13:52:34Z INFO 47306 [PSUM_Allocator]: spilling from PSUM cost about 0 cycles +2025-08-07T13:52:34Z INFO 47306 [PSUM_Allocator]: 100% PSUM utilization after allocation +2025-08-07T13:52:34Z USER 47306 [ModuleForkPass]: coloring_allocator_psum finished after 5.706 seconds +2025-08-07T13:52:34Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1661mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:34Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27949 memory location(s), 1 block(s), and 277215 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:34Z USER 47306 [ModuleForkPass]: Running dma_optimization_psum +2025-08-07T13:52:34Z INFO 47306 [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=27949 blocks=1 instructions=277215 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:35Z INFO 47306 [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions +2025-08-07T13:52:35Z INFO 47306 [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations +2025-08-07T13:52:35Z USER 47306 [ModuleForkPass]: dma_optimization_psum finished after 0.174 seconds +2025-08-07T13:52:35Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1661mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:35Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27949 memory location(s), 1 block(s), and 277215 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:35Z USER 47306 [ModuleForkPass]: Running address_rotation_psum +2025-08-07T13:52:35Z INFO 47306 [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=27949 blocks=1 instructions=277215 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:35Z INFO 47306 [DMAOptimizationBase]: PSUM Rotation rotated 832 PSUM Banks +2025-08-07T13:52:35Z INFO 47306 [DMAOptimizationBase]: PSUM Rotation rotated 202 PSUM Banks +2025-08-07T13:52:36Z INFO 47306 [DMAOptimizationBase]: PSUM Rotation rotated 335 PSUM Banks +2025-08-07T13:52:36Z USER 47306 [ModuleForkPass]: address_rotation_psum finished after 1.049 seconds +2025-08-07T13:52:36Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1667mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:52:36Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27949 memory location(s), 1 block(s), and 277215 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:36Z USER 47306 [ModuleForkPass]: Running coloring_allocator_sb +2025-08-07T13:52:36Z INFO 47306 [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=27949 blocks=1 instructions=277215 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:52:36Z INFO 47306 [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 7583364294 +2025-08-07T13:52:36Z INFO 47306 [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 7312 bytes +2025-08-07T13:52:36Z INFO 47306 [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 2812042 +2025-08-07T13:52:36Z INFO 47306 [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 397 bytes +2025-08-07T13:52:36Z INFO 47306 [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 78980 +2025-08-07T13:52:36Z INFO 47306 [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 136 bytes +2025-08-07T13:52:36Z INFO 47306 [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:52:36Z INFO 47306 [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: allocating SB +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: main loop +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: renumber locations +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: size = 15358 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: find partners +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: found 11522 accumulation groups +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: largest = _dot.9359-t36549 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: tensors = 49 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: requires 393280 bytes/partition +2025-08-07T13:52:36Z WARNING 47306 [SB_Allocator]: accumulation group is too large for SB +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: expanding partners +2025-08-07T13:52:36Z INFO 47306 []: find first defs for local +2025-08-07T13:52:36Z INFO 47306 []: find first defs for global +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: find loads +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: 1 pin count +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: 8233 remat count +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: 1 pinned tensors will require about 16384 bytes/partition +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: build interference graph +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: pass 1 int-tree +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: Num intervals 15358 Num locations 15358 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: IntervalTree Build Done +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: info.neighbors init Done +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: info.neighbors partners Done +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: IntervalTree readback Done +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: edge: 141619 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: mean: 18.4424 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: median: 10.3191 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: find costs +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: best-of-n loop, heuristic = 0 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: simplify interference graph +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: initialize safe & unsafe +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: safe = 14854 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: unsafe = 326 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: inf = 177 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: total = 15357 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: simplify +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: simplify_step3_sorted2 #Unsafe 106 #Pinned 0 #Safe 0 minCost 0.00302294 maxCost 2.36906 locations 15358 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: new candidates = 8 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: select ranges +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: Total: 15357 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: Spilled: 0.000 (0) +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: Allocated: 1.000 (15357) +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: Rover zone: 0.958 (14715) +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: Pre-rover zone: 0.033 (512) +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: Post-rover zone: 0.008 (126) +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: Slice zone: 0.000 (4) +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: Blocks nothing: 0.041 (633) +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: Blocks medium: 0.008 (120) +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: Visited until medium blocking (mean): 0.444 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: Visited until medium blocking (median): 0.456 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: Visited until medium blocking (p95): 0.818 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: Blocks tall: 0.951 (14604) +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: Visited until tall blocking (mean): 0.894 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: Visited until tall blocking (median): 1.000 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: Visited until tall blocking (p95): 1.000 +2025-08-07T13:52:36Z INFO 47306 [SB_Allocator]: Success +2025-08-07T13:53:01Z INFO 47306 [SB_Allocator]: SB spills = 0 tensors +2025-08-07T13:53:01Z INFO 47306 [SB_Allocator]: size = 0 bytes/partition +2025-08-07T13:53:01Z INFO 47306 [SB_Allocator]: remats = 0 tensors +2025-08-07T13:53:01Z INFO 47306 [SB_Allocator]: unpinned = 0 tensors +2025-08-07T13:53:01Z INFO 47306 [SB_Allocator]: size = 0 bytes/partition +2025-08-07T13:53:01Z INFO 47306 [SB_Allocator]: SB score = 0 +2025-08-07T13:53:01Z INFO 47306 [SB_Allocator]: spilling from SB cost about 0 cycles +2025-08-07T13:53:01Z INFO 47306 [SB_Allocator]: 16384 bytes/partition (100%) successfully pinned +2025-08-07T13:53:01Z INFO 47306 [SB_Allocator]: pinning saved approximately 9010 cycles +2025-08-07T13:53:01Z INFO 47306 [SB_Allocator]: 0% SB utilization after allocation +2025-08-07T13:53:01Z INFO 47306 [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 7583364294 +2025-08-07T13:53:01Z INFO 47306 [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 7312 bytes +2025-08-07T13:53:01Z INFO 47306 [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 2812042 +2025-08-07T13:53:01Z INFO 47306 [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 397 bytes +2025-08-07T13:53:01Z INFO 47306 [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 78980 +2025-08-07T13:53:01Z INFO 47306 [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 136 bytes +2025-08-07T13:53:01Z USER 47306 [ModuleForkPass]: coloring_allocator_sb finished after 25.641 seconds +2025-08-07T13:53:01Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1675mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:01Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27949 memory location(s), 1 block(s), and 277215 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:01Z USER 47306 [ModuleForkPass]: Running address_rotation_sb +2025-08-07T13:53:01Z INFO 47306 [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=27949 blocks=1 instructions=277215 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:02Z INFO 47306 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-08-07T13:53:02Z USER 47306 [ModuleForkPass]: address_rotation_sb finished after 0.360 seconds +2025-08-07T13:53:02Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1677mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:02Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27949 memory location(s), 1 block(s), and 277215 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:02Z USER 47306 [ModuleForkPass]: Running dma_optimization_sb +2025-08-07T13:53:02Z INFO 47306 [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=27949 blocks=1 instructions=277215 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:02Z INFO 47306 [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 7586176336, 99.9258% input load, 5.27275e-08% output write, 0.0742169% spill/reload [sg0000] +2025-08-07T13:53:02Z INFO 47306 [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-08-07T13:53:02Z INFO 47306 [DMAOptimizationBase]: removed 0 identical load +2025-08-07T13:53:02Z INFO 47306 [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-08-07T13:53:02Z INFO 47306 [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-08-07T13:53:02Z INFO 47306 [DMAOptimizationBase]: sub-graph will get execute 1 times +2025-08-07T13:53:02Z INFO 47306 [DMAOptimizationBase]: [IO to internal DMACopy Insertion]: inserted 0 DMACopy instructions +2025-08-07T13:53:02Z INFO 47306 [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions +2025-08-07T13:53:02Z INFO 47306 [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions +2025-08-07T13:53:02Z INFO 47306 [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 168, 2.21455e-06% out of total dma traffic(7.58055e+09) +2025-08-07T13:53:02Z INFO 47306 [DMAOptimizationBase]: [spill optimization round 0]: removed 6 spill/reload instructions +2025-08-07T13:53:02Z INFO 47306 [DMAOptimizationBase]: [spill optimization round 0]: removed 6 spill/reload memory locations +2025-08-07T13:53:02Z INFO 47306 [DMAOptimizationBase]: [spill optimization round 1]: removed 0 spill/reload instructions +2025-08-07T13:53:02Z INFO 47306 [DMAOptimizationBase]: [spill optimization round 1]: removed 0 spill/reload memory locations +2025-08-07T13:53:02Z INFO 47306 [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 4100, 0.0728213% out of total spill/reload dma traffic +2025-08-07T13:53:03Z INFO 47306 [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions +2025-08-07T13:53:03Z INFO 47306 [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations +2025-08-07T13:53:03Z INFO 47306 [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-08-07T13:53:03Z INFO 47306 [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-08-07T13:53:03Z INFO 47306 [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-08-07T13:53:03Z INFO 47306 [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-08-07T13:53:03Z INFO 47306 [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions +2025-08-07T13:53:03Z INFO 47306 [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload instructions +2025-08-07T13:53:03Z INFO 47306 [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload memory locations +2025-08-07T13:53:03Z INFO 47306 [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions +2025-08-07T13:53:03Z INFO 47306 [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 116 SpillSaves and Reloads +2025-08-07T13:53:03Z INFO 47306 [DMAOptimizationBase]: average loaded DMA size 7326 bytes +2025-08-07T13:53:03Z INFO 47306 [DMAOptimizationBase]: average saved DMA size 539 bytes +2025-08-07T13:53:03Z INFO 47306 [DMAOptimizationBase]: DMA SpillSave Coalescing Round 1 combined 56 SpillSaves and Reloads +2025-08-07T13:53:03Z INFO 47306 [DMAOptimizationBase]: average loaded DMA size 7332 bytes +2025-08-07T13:53:03Z INFO 47306 [DMAOptimizationBase]: average saved DMA size 650 bytes +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: DMA SpillSave Coalescing Round 2 combined 0 SpillSaves and Reloads +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: average loaded DMA size 7332 bytes +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: average saved DMA size 650 bytes +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 7583362076 +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 7332 bytes +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 2809992 +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 650 bytes +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 4268, 5.62602e-05% out of total dma traffic +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 7586172068, 99.9258% input load, 5.27275e-08% output write, 0.0741629% spill/reload [sg0000] +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 7583362076 +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 7332 bytes +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 2809992 +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 650 bytes +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 78980 +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 136 bytes +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 7300 bytes +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: DMA optimization re-enable optimization +2025-08-07T13:53:04Z USER 47306 [ModuleForkPass]: dma_optimization_sb finished after 2.243 seconds +2025-08-07T13:53:04Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1712mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:04Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277081 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:04Z USER 47306 [ModuleForkPass]: Running address_rotation_sb +2025-08-07T13:53:04Z INFO 47306 [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=27771 blocks=1 instructions=277081 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: SB Rotation rotated 359 Sb address +2025-08-07T13:53:04Z INFO 47306 [DMAOptimizationBase]: SB Rotation rotated 4462 Sb address +2025-08-07T13:53:05Z INFO 47306 [DMAOptimizationBase]: SB Rotation rotated 856 Sb address +2025-08-07T13:53:05Z INFO 47306 [DMAOptimizationBase]: SB Rotation rotated 445 Sb address +2025-08-07T13:53:05Z INFO 47306 [DMAOptimizationBase]: SB Rotation rotated 1976 Sb address +2025-08-07T13:53:06Z INFO 47306 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-08-07T13:53:06Z USER 47306 [ModuleForkPass]: address_rotation_sb finished after 1.694 seconds +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1712mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277081 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:06Z USER 47306 [ModuleForkPass]: Running coloring_allocator_dram +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=27771 blocks=1 instructions=277081 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:06Z INFO 47306 [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:53:06Z INFO 47306 [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: reserved space = 8344433440 bytes +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: spill space = 3420292 bytes +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: aligned spill space = 3469312 bytes +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: dram space = 107374182400 bytes +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: renumber locations +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: size = 178 +2025-08-07T13:53:06Z INFO 47306 []: find first defs for local +2025-08-07T13:53:06Z INFO 47306 []: find first defs for global +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: Num intervals 178 Num locations 178 +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: IntervalTree Build Done +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: info.neighbors init Done +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: IntervalTree readback Done +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: simplify interference graph +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: initialize low and high +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: lo = 178 +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: hi = 0 +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: total = 178 +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: simplify +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: new candidates = 0 +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: select ranges +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: CC buffer size limit 524288000 +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: allreduce_dram_hwm 1208320 +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: Real CC buffer size 1208320 +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: DRAM hwm after allocation: 3117056 +2025-08-07T13:53:06Z INFO 47306 [DRAM_Allocator]: DRAM allocation successful +2025-08-07T13:53:06Z USER 47306 [ModuleForkPass]: coloring_allocator_dram finished after 0.381 seconds +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1715mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277081 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:06Z USER 47306 [ModuleForkPass]: Running address_rotation_dram +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=27771 blocks=1 instructions=277081 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:06Z INFO 47306 [DMAOptimizationBase]: Runtime page size at 512MB +2025-08-07T13:53:06Z INFO 47306 [DMAOptimizationBase]: DRAM hwm before rotation 3117056 +2025-08-07T13:53:06Z INFO 47306 [DMAOptimizationBase]: allreduce buffer size 524288000 +2025-08-07T13:53:06Z INFO 47306 [DMAOptimizationBase]: allreduce hwm 1208320 +2025-08-07T13:53:06Z INFO 47306 [DMAOptimizationBase]: Real CC buffer size 1208320 +2025-08-07T13:53:06Z INFO 47306 [DMAOptimizationBase]: DRAM hwm after rotation 3117056 +2025-08-07T13:53:06Z INFO 47306 [DMAOptimizationBase]: DRAM Rotation rotated 9 Dram address +2025-08-07T13:53:06Z USER 47306 [ModuleForkPass]: address_rotation_dram finished after 0.186 seconds +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1717mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277081 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:06Z USER 47306 [ModuleForkPass]: Running tensorcopy_accel +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=27771 blocks=1 instructions=277081 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:06Z INFO 47306 [TensorCopyAccel::Impl]: Running peephole optimization pass +2025-08-07T13:53:06Z INFO 47306 [TensorCopyAccel::Impl]: Accelerated 72 out of 11862 tensorcopy in Function: sg0000 average acceleration factor: 1 +2025-08-07T13:53:06Z USER 47306 [ModuleForkPass]: tensorcopy_accel finished after 0.024 seconds +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1717mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277081 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:06Z USER 47306 [ModuleForkPass]: Running peephole_opts +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=27771 blocks=1 instructions=277081 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:06Z INFO 47306 [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true +2025-08-07T13:53:06Z USER 47306 [ModuleForkPass]: peephole_opts finished after 0.100 seconds +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1717mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277119 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:06Z USER 47306 [ModuleForkPass]: Running lower_kernel +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=27771 blocks=1 instructions=277119 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:06Z INFO 47306 [LowerKernel]: Started running LowerKernel +2025-08-07T13:53:06Z INFO 47306 [LowerKernel]: Start of kernel lowering pass, number of insts: 277119, number of allocs: 27771 +2025-08-07T13:53:06Z INFO 47306 [LowerKernel]: Scan BKs time (s): 0.019819 +2025-08-07T13:53:06Z INFO 47306 [LowerKernel]: Lower BKs time (s): 6e-06 +2025-08-07T13:53:06Z USER 47306 [ModuleForkPass]: lower_kernel finished after 0.022 seconds +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1717mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277119 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:06Z USER 47306 [ModuleForkPass]: Running lower_nki_kernel +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=27771 blocks=1 instructions=277119 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:06Z USER 47306 [ModuleForkPass]: lower_nki_kernel finished after 0.021 seconds +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1717mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277119 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:06Z USER 47306 [ModuleForkPass]: Running dynamic_dma_cleanup +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=27771 blocks=1 instructions=277119 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:06Z USER 47306 [ModuleForkPass]: dynamic_dma_cleanup finished after 0.032 seconds +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1719mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277119 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:06Z USER 47306 [ModuleForkPass]: Running birverifier +2025-08-07T13:53:06Z INFO 47306 [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=27771 blocks=1 instructions=277119 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:07Z USER 47306 [ModuleForkPass]: birverifier finished after 0.203 seconds +2025-08-07T13:53:07Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1719mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:07Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277119 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:07Z USER 47306 [ModuleForkPass]: Running dynamic_dma_scan +2025-08-07T13:53:07Z INFO 47306 [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=27771 blocks=1 instructions=277119 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:07Z USER 47306 [ModuleForkPass]: dynamic_dma_scan finished after 0.032 seconds +2025-08-07T13:53:07Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1719mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:07Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277119 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:07Z USER 47306 [ModuleForkPass]: Running build_fdeps +2025-08-07T13:53:07Z INFO 47306 [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=27771 blocks=1 instructions=277119 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:07Z INFO 47306 [build_flow_deps]: Start build fdeps. Invocation: 2Thu Aug 7 13:53:07 2025 +2025-08-07T13:53:07Z INFO 47306 [build_flow_deps]: Allocs: 27771 instructions: 277119 +2025-08-07T13:53:07Z INFO 47306 [build_flow_deps]: Build fdeps inserted 816543 edges +2025-08-07T13:53:07Z INFO 47306 [build_flow_deps]: Done build fdeps 816543 Thu Aug 7 13:53:07 2025 +2025-08-07T13:53:07Z USER 47306 [ModuleForkPass]: build_fdeps finished after 0.601 seconds +2025-08-07T13:53:07Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1731mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:07Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277119 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:07Z USER 47306 [ModuleForkPass]: Running remove_redundancies +2025-08-07T13:53:07Z INFO 47306 [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=27771 blocks=1 instructions=277119 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:07Z INFO 47306 [RemoveRedundancies]: remove_clobbered_writes +2025-08-07T13:53:07Z INFO 47306 [RemoveRedundancies]: remove_clobbered_writes: 0 +2025-08-07T13:53:07Z INFO 47306 [RemoveRedundancies]: remove_useless_insts +2025-08-07T13:53:07Z INFO 47306 [RemoveRedundancies]: remove Useless Instructions: 0 +2025-08-07T13:53:07Z USER 47306 [ModuleForkPass]: remove_redundancies finished after 0.092 seconds +2025-08-07T13:53:07Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1731mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:07Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277119 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:07Z USER 47306 [ModuleForkPass]: Running anti_dependency_analyzer +2025-08-07T13:53:07Z INFO 47306 [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=27771 blocks=1 instructions=277119 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:07Z INFO 47306 [AntiDependencyAnalyzer]: Batch size: 1000 +2025-08-07T13:53:07Z INFO 47306 [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-08-07T13:53:07Z INFO 47306 [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 +2025-08-07T13:53:09Z USER 47306 [ModuleForkPass]: anti_dependency_analyzer finished after 1.184 seconds +2025-08-07T13:53:09Z INFO 47306 [ModuleForkPass]: curr_vmrss: 2167mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:09Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277119 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:09Z USER 47306 [ModuleForkPass]: Running tensor_copy_elim +2025-08-07T13:53:09Z INFO 47306 [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=27771 blocks=1 instructions=277119 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:09Z INFO 47306 [TensorCopyElim]: Tensor CP elimination: 0 +2025-08-07T13:53:09Z INFO 47306 [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-08-07T13:53:09Z USER 47306 [ModuleForkPass]: tensor_copy_elim finished after 0.266 seconds +2025-08-07T13:53:09Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1829mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:09Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277119 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:09Z USER 47306 [ModuleForkPass]: Running prefetch_scheduling_before_sched +2025-08-07T13:53:09Z INFO 47306 [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=27771 blocks=1 instructions=277119 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:09Z USER 47306 [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.001 seconds +2025-08-07T13:53:09Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1829mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:09Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277119 instruction(s). Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:09Z USER 47306 [ModuleForkPass]: Running post_sched +2025-08-07T13:53:09Z INFO 47306 [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=27771 blocks=1 instructions=277119 Max writers: 1536 Max Readers: 20035 +2025-08-07T13:53:09Z INFO 47306 [post_scheduler]: Start PosT ScheD 3 sunda Thu Aug 7 13:53:09 2025 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.395-t35739 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.727-t35769 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.1059-t35799 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.1391-t35829 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.1723-t35859 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.2055-t35889 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.2387-t35919 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.2719-t35949 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.3051-t35979 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.3383-t36009 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.3715-t36039 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.4047-t36069 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.4379-t36099 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.4711-t36129 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.5043-t36159 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.5375-t36189 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.5707-t36219 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.6039-t36249 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.6371-t36279 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.6703-t36309 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.7035-t36339 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.7367-t36369 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.7699-t36399 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.8031-t36429 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.8363-t36459 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.8695-t36489 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.9027-t36519 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.9359-t36549 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.9691-t36579 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.10023-t36609 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.10355-t36639 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.10687-t36669 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.11019-t36699 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.11351-t36729 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.11683-t36759 +2025-08-07T13:53:09Z WARNING 47306 [post_scheduler]: Inserted memset 0 for _dot.12015-t36789 +2025-08-07T13:53:21Z INFO 47306 [post_scheduler]: Time-aware hwm post-sched +2025-08-07T13:53:27Z INFO 47306 [post_scheduler]: Time-aware simulation time: 34821961 +2025-08-07T13:53:28Z INFO 47306 [post_scheduler]: Done PosT ScheD Thu Aug 7 13:53:28 2025 +2025-08-07T13:53:28Z USER 47306 [ModuleForkPass]: post_sched finished after 19.659 seconds +2025-08-07T13:53:28Z INFO 47306 [ModuleForkPass]: curr_vmrss: 2327mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:29Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:29Z USER 47306 [ModuleForkPass]: Running expand_scheduling_units +2025-08-07T13:53:29Z INFO 47306 [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:29Z USER 47306 [ModuleForkPass]: expand_scheduling_units finished after 0.029 seconds +2025-08-07T13:53:29Z INFO 47306 [ModuleForkPass]: curr_vmrss: 2211mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:29Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:29Z USER 47306 [ModuleForkPass]: Running address_rotation_sb +2025-08-07T13:53:29Z INFO 47306 [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:32Z INFO 47306 [DMAOptimizationBase]: PSUM Rotation rotated 6277 PSUM Banks +2025-08-07T13:53:32Z INFO 47306 [DMAOptimizationBase]: PSUM Rotation rotated 7380 PSUM Banks +2025-08-07T13:53:33Z INFO 47306 [DMAOptimizationBase]: PSUM Rotation rotated 305 PSUM Banks +2025-08-07T13:53:33Z INFO 47306 [DMAOptimizationBase]: SB Rotation rotated 420 Sb address +2025-08-07T13:53:34Z INFO 47306 [DMAOptimizationBase]: SB Rotation rotated 4686 Sb address +2025-08-07T13:53:34Z INFO 47306 [DMAOptimizationBase]: SB Rotation rotated 389 Sb address +2025-08-07T13:53:34Z INFO 47306 [DMAOptimizationBase]: SB Rotation rotated 210 Sb address +2025-08-07T13:53:34Z INFO 47306 [DMAOptimizationBase]: SB Rotation rotated 183 Sb address +2025-08-07T13:53:35Z INFO 47306 [DMAOptimizationBase]: moved 0 MM forward +2025-08-07T13:53:35Z INFO 47306 [DMAOptimizationBase]: SB Rotation rotated 7 Sb address +2025-08-07T13:53:35Z INFO 47306 [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-08-07T13:53:35Z USER 47306 [ModuleForkPass]: address_rotation_sb finished after 6.667 seconds +2025-08-07T13:53:35Z INFO 47306 [ModuleForkPass]: curr_vmrss: 2230mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:35Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:35Z USER 47306 [ModuleForkPass]: Running anti_dependency_analyzer +2025-08-07T13:53:35Z INFO 47306 [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:35Z INFO 47306 [AntiDependencyAnalyzer]: Batch size: 1000 +2025-08-07T13:53:35Z INFO 47306 [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-08-07T13:53:35Z INFO 47306 [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 +2025-08-07T13:53:36Z USER 47306 [ModuleForkPass]: anti_dependency_analyzer finished after 1.115 seconds +2025-08-07T13:53:36Z INFO 47306 [ModuleForkPass]: curr_vmrss: 2340mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:36Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:36Z USER 47306 [ModuleForkPass]: Running anti_dependency_analyzer +2025-08-07T13:53:36Z INFO 47306 [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:36Z INFO 47306 [AntiDependencyAnalyzer]: Batch size: 1000 +2025-08-07T13:53:36Z INFO 47306 [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} +2025-08-07T13:53:36Z INFO 47306 [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 +2025-08-07T13:53:37Z USER 47306 [ModuleForkPass]: anti_dependency_analyzer finished after 0.170 seconds +2025-08-07T13:53:37Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1979mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:37Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:37Z USER 47306 [ModuleForkPass]: Running dep_opt +2025-08-07T13:53:37Z INFO 47306 [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:37Z INFO 47306 [build_flow_deps]: Start build fdeps. Invocation: 3Thu Aug 7 13:53:37 2025 +2025-08-07T13:53:37Z INFO 47306 [build_flow_deps]: Allocs: 27771 instructions: 277155 +2025-08-07T13:53:37Z INFO 47306 [build_flow_deps]: Build fdeps inserted 806528 edges +2025-08-07T13:53:37Z INFO 47306 [build_flow_deps]: Done build fdeps 806528 Thu Aug 7 13:53:37 2025 +2025-08-07T13:53:38Z USER 47306 [ModuleForkPass]: dep_opt finished after 1.062 seconds +2025-08-07T13:53:38Z INFO 47306 [ModuleForkPass]: curr_vmrss: 2007mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:38Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [ModuleForkPass]: Running report_stats +2025-08-07T13:53:38Z INFO 47306 [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z INFO 47306 [ReportStats]: Data Movement Statistics: sg0000 +┌─────────────┬────────────────────────────┬───────┬────────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├─────────────┼────────────────────────────┼───────┼────────────┤ +│ DMACopy │ ExternalInput -> Internal │ 1 │ 622329856 │ +│ DMACopy │ Internal │ 1 │ 24576 │ +│ DMACopy │ Internal -> ExternalOutput │ 72 │ 75497472 │ +│ Load │ Const -> Internal │ 77 │ 2394376 │ +│ Load │ ExternalInput -> Internal │ 8047 │ 7578151564 │ +│ Load │ Internal │ 107 │ 2816136 │ +│ Save │ Internal │ 695 │ 2809988 │ +│ Save │ Internal -> ExternalOutput │ 1 │ 4 │ +└─────────────┴────────────────────────────┴───────┴────────────┘ + +2025-08-07T13:53:38Z INFO 47306 [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 1 │ 2 │ +│ 2 │ 72 │ +│ 4 │ 45 │ +│ 8 │ 2 │ +│ 16 │ 3 │ +│ 64 │ 73 │ +│ 256 │ 147 │ +│ 512 │ 665 │ +│ 1024 │ 88 │ +│ 2048 │ 30 │ +│ 4096 │ 2 │ +│ 6144 │ 2304 │ +│ 8192 │ 5493 │ +│ 60768 │ 1 │ +│ 60776 │ 4 │ +│ 262144 │ 72 │ +└─────────────────────┴───────┘ + +2025-08-07T13:53:38Z INFO 47306 [ReportStats]: MM Stats: #MatMults 251660 #MatMult-Transposes 20035 +2025-08-07T13:53:38Z INFO 47306 [ReportStats]: IO Tensor size combined: 8342039064 +2025-08-07T13:53:38Z INFO 47306 [ReportStats]: IO Tensor Statistics: +┌────────────────────┬───────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼───────────────┼──────────┼──────────────┤ +│ input473 │ ExternalInput │ bfloat16 │ 622329856 │ +│ input76 │ ExternalInput │ bfloat16 │ 622329856 │ +│ input85 │ ExternalInput │ bfloat16 │ 50331648 │ +│ input106 │ ExternalInput │ bfloat16 │ 50331648 │ +│ input96 │ ExternalInput │ bfloat16 │ 50331648 │ +│ input84 │ ExternalInput │ bfloat16 │ 50331648 │ +│ input98 │ ExternalInput �� bfloat16 │ 50331648 │ +│ input109 │ ExternalInput │ bfloat16 │ 50331648 │ +│ input107 │ ExternalInput │ bfloat16 │ 50331648 │ +│ input95 │ ExternalInput │ bfloat16 │ 50331648 │ +└────────────────────┴───────────────┴──────────┴──────────────┘ + +2025-08-07T13:53:38Z INFO 47306 [ReportStats]: Large (Internal) Tensor Statistics: +┌────────────────────────┬──────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────────┼──────────┼──────────┼──────────────┤ +│ DynamicDMAScratchLoc │ Internal │ uint8 │ 2097152 │ +│ input83_local_32950_i3 │ Internal │ bfloat16 │ 1048576 │ +│ -t62398 │ Internal │ float32 │ 1048576 │ +│ input83_local_32950_i1 │ Internal │ bfloat16 │ 1048576 │ +│ -t62392 │ Internal │ float32 │ 1048576 │ +│ input83_local_32950_i2 │ Internal │ bfloat16 │ 1048576 │ +│ -t62387 │ Internal │ float32 │ 1048576 │ +│ input83_local_32950_i5 │ Internal │ bfloat16 │ 1048576 │ +│ input83_local_32950_i4 │ Internal │ bfloat16 │ 1048576 │ +│ input83_local_32950_i0 │ Internal │ bfloat16 │ 1048576 │ +└────────────────────────┴──────────┴──────────┴──────────────┘ + +2025-08-07T13:53:38Z USER 47306 [ModuleForkPass]: report_stats finished after 0.060 seconds +2025-08-07T13:53:38Z INFO 47306 [ModuleForkPass]: curr_vmrss: 2007mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:38Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [BackendPassManager]: mod_parallel_pass finished after 72.255 seconds +2025-08-07T13:53:38Z INFO 47306 [BackendPassManager]: curr_vmrss: 2007mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:38Z INFO 47306 [BackendPassManager]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [BackendPassManager]: Running assign_trigger_engine +2025-08-07T13:53:38Z INFO 47306 [BackendPassManager]: Inputs to assign_trigger_engine: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z INFO 47306 [AssignTriggerEngine]: Assigned trigger engine for 771 DMA instructions. Moved 76 DMA instructions to CC's engines. +2025-08-07T13:53:38Z USER 47306 [BackendPassManager]: assign_trigger_engine finished after 0.098 seconds +2025-08-07T13:53:38Z INFO 47306 [BackendPassManager]: curr_vmrss: 2008mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:38Z INFO 47306 [BackendPassManager]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [BackendPassManager]: Running subgraph_parallel_pass +2025-08-07T13:53:38Z INFO 47306 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [SubgraphForkPass]: Running lower_local_collectives +2025-08-07T13:53:38Z INFO 47306 [SubgraphForkPass]: Inputs to lower_local_collectives: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [SubgraphForkPass]: lower_local_collectives finished after 0.001 seconds +2025-08-07T13:53:38Z INFO 47306 [SubgraphForkPass]: curr_vmrss: 2008mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:38Z INFO 47306 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [SubgraphForkPass]: Running extend_shared_lifetimes +2025-08-07T13:53:38Z INFO 47306 [SubgraphForkPass]: Inputs to extend_shared_lifetimes: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [SubgraphForkPass]: extend_shared_lifetimes finished after 0.001 seconds +2025-08-07T13:53:38Z INFO 47306 [SubgraphForkPass]: curr_vmrss: 2008mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:38Z INFO 47306 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [SubgraphForkPass]: Running dead_code_elim +2025-08-07T13:53:38Z INFO 47306 [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z INFO 47306 [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-08-07T13:53:38Z USER 47306 [SubgraphForkPass]: dead_code_elim finished after 0.169 seconds +2025-08-07T13:53:38Z INFO 47306 [SubgraphForkPass]: curr_vmrss: 2009mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:38Z INFO 47306 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [BackendPassManager]: subgraph_parallel_pass finished after 0.177 seconds +2025-08-07T13:53:38Z INFO 47306 [BackendPassManager]: curr_vmrss: 2009mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:38Z INFO 47306 [BackendPassManager]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [BackendPassManager]: Running assign_hwdge_engine +2025-08-07T13:53:38Z INFO 47306 [BackendPassManager]: Inputs to assign_hwdge_engine: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [BackendPassManager]: assign_hwdge_engine finished after 0.029 seconds +2025-08-07T13:53:38Z INFO 47306 [BackendPassManager]: curr_vmrss: 2009mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:38Z INFO 47306 [BackendPassManager]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [BackendPassManager]: Running mod_parallel_pass +2025-08-07T13:53:38Z INFO 47306 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [ModuleForkPass]: Running alloc_queues +2025-08-07T13:53:38Z INFO 47306 [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z INFO 47306 [AllocQueues]: DMACopy transpose will be triggered from multiple engines +2025-08-07T13:53:38Z INFO 47306 [AllocQueues]: Alloc Queue info: +┌───────────────────┬────────────────┬────────────┬────────────┬──────────────────┐ +│ Name │ DMAQueue::Type │ Engine │ Num Queues │ Num instructions │ +├───────────────────┼────────────────┼────────────┼────────────┼──────────────────┤ +│ qSPIO0 │ input │ SP │ 16 │ 35 │ +│ qPoolIO0 │ input │ Pool │ 16 │ 1 │ +│ qSPSpillReload0 │ data │ SP │ 16 │ 109 │ +│ qPoolSpillReload0 │ data │ Pool │ 16 │ 95 │ +│ qActSpillReload0 │ data │ Activation │ 16 │ 671 │ +│ qDVESpillReload0 │ data │ DVE │ 16 │ 5 │ +│ qPoolDynamic │ dynamic │ Pool │ 16 │ 8085 │ +└───────────────────┴────────────────┴────────────┴────────────┴──────────────────┘ + +2025-08-07T13:53:38Z USER 47306 [ModuleForkPass]: alloc_queues finished after 0.030 seconds +2025-08-07T13:53:38Z INFO 47306 [ModuleForkPass]: curr_vmrss: 2009mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:38Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [ModuleForkPass]: Running chain_dma_transposes +2025-08-07T13:53:38Z INFO 47306 [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [ModuleForkPass]: chain_dma_transposes finished after 0.001 seconds +2025-08-07T13:53:38Z INFO 47306 [ModuleForkPass]: curr_vmrss: 2009mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:38Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [ModuleForkPass]: Running prefetch_scheduling_after_sched +2025-08-07T13:53:38Z INFO 47306 [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.001 seconds +2025-08-07T13:53:38Z INFO 47306 [ModuleForkPass]: curr_vmrss: 2009mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:38Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [ModuleForkPass]: Running lower_control +2025-08-07T13:53:38Z INFO 47306 [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z INFO 47306 [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps +2025-08-07T13:53:38Z USER 47306 [ModuleForkPass]: lower_control finished after 0.337 seconds +2025-08-07T13:53:38Z INFO 47306 [ModuleForkPass]: curr_vmrss: 2009mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:38Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [BackendPassManager]: mod_parallel_pass finished after 0.377 seconds +2025-08-07T13:53:38Z INFO 47306 [BackendPassManager]: curr_vmrss: 2009mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:38Z INFO 47306 [BackendPassManager]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [BackendPassManager]: Running nc_parallel_pass +2025-08-07T13:53:38Z INFO 47306 [BackendPassManager]: Inputs to nc_parallel_pass: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z USER 47306 [CoreForkPass]: Running dep_reduction +2025-08-07T13:53:38Z INFO 47306 [CoreForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:38Z INFO 47306 [DepReduction]: Start Dependency Reduction +2025-08-07T13:53:39Z INFO 47306 [DepReduction]: Processing async instrs... +2025-08-07T13:53:39Z INFO 47306 [DepReduction]: Processing secondary edges per engine... +2025-08-07T13:53:39Z INFO 47306 [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 253054 +2025-08-07T13:53:39Z INFO 47306 [DepReduction]: Processing redundant descendants, Done. Num edges removed 262102 +2025-08-07T13:53:39Z INFO 47306 [DepReduction]: Processing async instrs, Done. Num edges removed 262102 +2025-08-07T13:53:42Z INFO 47306 [DepReduction]: Num Async removed: 0 +2025-08-07T13:53:42Z INFO 47306 [DepReduction]: Finished dependency reduction: 1848725 removed, new total 38899 +2025-08-07T13:53:42Z INFO 47306 [DepReduction]: Finished Dependency Reduction +2025-08-07T13:53:42Z USER 47306 [CoreForkPass]: dep_reduction finished after 3.228 seconds +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: curr_vmrss: 2254mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:42Z USER 47306 [CoreForkPass]: Running lower_dynamic_dma +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: Inputs to lower_dynamic_dma: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:42Z USER 47306 [CoreForkPass]: lower_dynamic_dma finished after 0.044 seconds +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: curr_vmrss: 2228mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:42Z USER 47306 [CoreForkPass]: Running legalize_dynamic_dma +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: Inputs to legalize_dynamic_dma: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:42Z INFO 47306 [LegalizeDynamicDMA]: Legalize Dynamic DMA scanned 1 DGE instructions +2025-08-07T13:53:42Z INFO 47306 [LegalizeDynamicDMA]: After Legalize Dynamic DMA, 1 DGE instructions were scanned +2025-08-07T13:53:42Z INFO 47306 [LegalizeDynamicDMA]: +┌───────────┬───────────────────────────────┬────────────────────────────┐ +│ Sub-Pass │ Illegal Instructions Detected │ New Instructions Generated │ +├────��──────┼───────────────────────────────┼────────────────────────────┤ +│ Peeling │ 0 │ 0 │ +│ Unrolling │ 0 │ 0 │ +│ Splitting │ 0 │ 0 │ +└───────────┴───────────────────────────────┴────────────────────────────┘ + +2025-08-07T13:53:42Z USER 47306 [CoreForkPass]: legalize_dynamic_dma finished after 0.108 seconds +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: curr_vmrss: 2228mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277155 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:42Z USER 47306 [CoreForkPass]: Running lower_dma +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: Inputs to lower_dma: modules=1 functions=1 allocs=27771 blocks=1 instructions=277155 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:42Z INFO 47306 [LowerDMA]: lower_dma metrics start + IO + Copy (DGE/DMA) + 128 partition : 7939/7939 (100% DGE) + power-of-2 partition : 7940/7976 (99.5486% DGE) + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 7940/7976 (99.5486% DGE) + Cast (DGE/DMA) + 128 partition : 72/72 (100% DGE) + power-of-2 partition : 72/72 (100% DGE) + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 72/72 (100% DGE) + Spill/Reload + Copy (DGE/DMA) + 128 partition : 0/8 (0% DGE) + power-of-2 partition : 0/879 (0% DGE) + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 0/879 (0% DGE) + Cast (DGE/DMA) + 128 partition : 0/0 + power-of-2 partition : 0/0 + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 0/0 + CopyMode + CCE : 1 + Transpose : 0 + Replicate : 0 + Dynamic (DGE/DMA) + scalar : 1/1 (100% DGE) + vector : 72/72 (100% DGE) + Opcode + ReadVarAddr : 0 + IndirectLoad : 0 + IndirectSave : 0 + IndirectSaveAccumulate : 0 + DstReduceDGE : 0 +lower_dma metrics end +2025-08-07T13:53:42Z USER 47306 [CoreForkPass]: lower_dma finished after 0.140 seconds +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: curr_vmrss: 2228mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277157 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:42Z USER 47306 [CoreForkPass]: Running coalesce_dma_blocks +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: Inputs to coalesce_dma_blocks: modules=1 functions=1 allocs=27771 blocks=1 instructions=277157 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:42Z INFO 47306 [CoalesceDmaBlocks]: Coaleseced 50 DMA triggers +2025-08-07T13:53:42Z USER 47306 [CoreForkPass]: coalesce_dma_blocks finished after 0.111 seconds +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: curr_vmrss: 2232mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277107 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:42Z USER 47306 [CoreForkPass]: Running expand_all_engine +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: Inputs to expand_all_engine: modules=1 functions=1 allocs=27771 blocks=1 instructions=277107 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:42Z USER 47306 [CoreForkPass]: expand_all_engine finished after 0.041 seconds +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: curr_vmrss: 2227mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277107 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:42Z USER 47306 [CoreForkPass]: Running alloc_semaphores +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: Inputs to alloc_semaphores: modules=1 functions=1 allocs=27771 blocks=1 instructions=277107 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:42Z USER 47306 [CoreForkPass]: alloc_semaphores finished after 0.332 seconds +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: curr_vmrss: 2227mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277107 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:42Z USER 47306 [CoreForkPass]: Running expand_inst_late +2025-08-07T13:53:42Z INFO 47306 [CoreForkPass]: Inputs to expand_inst_late: modules=1 functions=1 allocs=27771 blocks=1 instructions=277107 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:43Z USER 47306 [CoreForkPass]: expand_inst_late finished after 0.395 seconds +2025-08-07T13:53:43Z INFO 47306 [CoreForkPass]: curr_vmrss: 2227mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:43Z INFO 47306 [CoreForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277182 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:43Z USER 47306 [CoreForkPass]: Running seq_inst_opt +2025-08-07T13:53:43Z INFO 47306 [CoreForkPass]: Inputs to seq_inst_opt: modules=1 functions=1 allocs=27771 blocks=1 instructions=277182 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:43Z INFO 47306 [SeqInstOpt]: Removing 71 unnecessary InstRegisterMove instruction(s) from Block1 +2025-08-07T13:53:43Z USER 47306 [CoreForkPass]: seq_inst_opt finished after 0.031 seconds +2025-08-07T13:53:43Z INFO 47306 [CoreForkPass]: curr_vmrss: 2227mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:43Z INFO 47306 [CoreForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 277111 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:43Z USER 47306 [CoreForkPass]: Running lower_sync +2025-08-07T13:53:43Z INFO 47306 [CoreForkPass]: Inputs to lower_sync: modules=1 functions=1 allocs=27771 blocks=1 instructions=277111 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:43Z USER 47306 [CoreForkPass]: lower_sync finished after 0.086 seconds +2025-08-07T13:53:43Z INFO 47306 [CoreForkPass]: curr_vmrss: 2234mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:43Z INFO 47306 [CoreForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 285858 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:43Z USER 47306 [CoreForkPass]: Running lower_act +2025-08-07T13:53:43Z INFO 47306 [CoreForkPass]: Inputs to lower_act: modules=1 functions=1 allocs=27771 blocks=1 instructions=285858 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:43Z USER 47306 [CoreForkPass]: lower_act finished after 0.030 seconds +2025-08-07T13:53:43Z INFO 47306 [CoreForkPass]: curr_vmrss: 2234mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:43Z INFO 47306 [CoreForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 286091 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:43Z USER 47306 [CoreForkPass]: Running lower_dve +2025-08-07T13:53:43Z INFO 47306 [CoreForkPass]: Inputs to lower_dve: modules=1 functions=1 allocs=27771 blocks=1 instructions=286091 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:43Z INFO 47306 [LowerDVE]: Loading DVE opcodes table dve_info.json from /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/dve/dve_bin_gen2/dve_info.json +2025-08-07T13:53:43Z USER 47306 [CoreForkPass]: lower_dve finished after 0.332 seconds +2025-08-07T13:53:43Z INFO 47306 [CoreForkPass]: curr_vmrss: 2278mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:43Z INFO 47306 [CoreForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 286091 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:43Z USER 47306 [CoreForkPass]: Running lower_ap +2025-08-07T13:53:43Z INFO 47306 [CoreForkPass]: Inputs to lower_ap: modules=1 functions=1 allocs=27771 blocks=1 instructions=286091 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:43Z USER 47306 [CoreForkPass]: lower_ap finished after 0.046 seconds +2025-08-07T13:53:43Z INFO 47306 [CoreForkPass]: curr_vmrss: 2236mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:43Z INFO 47306 [CoreForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 286091 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:43Z USER 47306 [CoreForkPass]: Running coloring_allocator_reg +2025-08-07T13:53:43Z INFO 47306 [CoreForkPass]: Inputs to coloring_allocator_reg: modules=1 functions=1 allocs=27771 blocks=1 instructions=286091 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:43Z INFO 47306 [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:53:43Z INFO 47306 [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:53:43Z INFO 47306 [REG_Allocator]: allocating REG +2025-08-07T13:53:43Z INFO 47306 [REG_Allocator]: main loop iteration 1 +2025-08-07T13:53:43Z INFO 47306 [REG_Allocator]: renumber registers +2025-08-07T13:53:43Z INFO 47306 [REG_Allocator]: size = 5 +2025-08-07T13:53:43Z INFO 47306 []: find first defs for local reg +2025-08-07T13:53:43Z INFO 47306 []: find first defs for global reg +2025-08-07T13:53:44Z INFO 47306 [REG_Allocator]: live range analysis +2025-08-07T13:53:44Z INFO 47306 [REG_Allocator]: find costs +2025-08-07T13:53:44Z INFO 47306 [REG_Allocator]: simplify interference graph +2025-08-07T13:53:44Z INFO 47306 [REG_Allocator]: initialize low and high +2025-08-07T13:53:44Z INFO 47306 [REG_Allocator]: lo = 5 +2025-08-07T13:53:44Z INFO 47306 [REG_Allocator]: hi = 0 +2025-08-07T13:53:44Z INFO 47306 [REG_Allocator]: inf = 0 +2025-08-07T13:53:44Z INFO 47306 [REG_Allocator]: total = 5 +2025-08-07T13:53:44Z INFO 47306 [REG_Allocator]: simplify +2025-08-07T13:53:44Z INFO 47306 [REG_Allocator]: new candidates = 0 +2025-08-07T13:53:44Z INFO 47306 [REG_Allocator]: select ranges +2025-08-07T13:53:44Z INFO 47306 [REG_Allocator]: no more spills +2025-08-07T13:53:44Z INFO 47306 [REG_Allocator]: REG score = 0 (lower is better) +2025-08-07T13:53:44Z INFO 47306 [REG_Allocator]: Spilling from REG cost about 0 cycles +2025-08-07T13:53:44Z INFO 47306 [REG_Allocator]: 0% REG utilization after allocation +2025-08-07T13:53:44Z USER 47306 [CoreForkPass]: coloring_allocator_reg finished after 0.389 seconds +2025-08-07T13:53:44Z INFO 47306 [CoreForkPass]: curr_vmrss: 2281mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:44Z INFO 47306 [CoreForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 286091 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:44Z USER 47306 [BackendPassManager]: nc_parallel_pass finished after 5.655 seconds +2025-08-07T13:53:44Z INFO 47306 [BackendPassManager]: curr_vmrss: 2236mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:44Z INFO 47306 [BackendPassManager]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 286091 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:44Z USER 47306 [BackendPassManager]: Running mod_parallel_pass +2025-08-07T13:53:44Z INFO 47306 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=27771 blocks=1 instructions=286091 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:44Z USER 47306 [ModuleForkPass]: Running birverifier +2025-08-07T13:53:44Z INFO 47306 [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=27771 blocks=1 instructions=286091 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:44Z USER 47306 [ModuleForkPass]: birverifier finished after 0.226 seconds +2025-08-07T13:53:44Z INFO 47306 [ModuleForkPass]: curr_vmrss: 1983mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:44Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 286091 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:44Z USER 47306 [BackendPassManager]: mod_parallel_pass finished after 0.232 seconds +2025-08-07T13:53:44Z INFO 47306 [BackendPassManager]: curr_vmrss: 1983mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:44Z INFO 47306 [BackendPassManager]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 286091 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:44Z USER 47306 [BackendPassManager]: Running subgraph_parallel_pass +2025-08-07T13:53:44Z INFO 47306 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=1 allocs=27771 blocks=1 instructions=286091 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:44Z USER 47306 [SubgraphForkPass]: Running lnc_verifier +2025-08-07T13:53:44Z INFO 47306 [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=27771 blocks=1 instructions=286091 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:44Z USER 47306 [SubgraphForkPass]: lnc_verifier finished after 0.001 seconds +2025-08-07T13:53:44Z INFO 47306 [SubgraphForkPass]: curr_vmrss: 1983mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:44Z INFO 47306 [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 286091 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:44Z USER 47306 [BackendPassManager]: subgraph_parallel_pass finished after 0.004 seconds +2025-08-07T13:53:44Z INFO 47306 [BackendPassManager]: curr_vmrss: 1983mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:44Z INFO 47306 [BackendPassManager]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 286091 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:44Z USER 47306 [BackendPassManager]: Running mod_parallel_pass +2025-08-07T13:53:44Z INFO 47306 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=1 allocs=27771 blocks=1 instructions=286091 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:44Z USER 47306 [ModuleForkPass]: Running codegen +2025-08-07T13:53:44Z INFO 47306 [ModuleForkPass]: Inputs to codegen: modules=1 functions=1 allocs=27771 blocks=1 instructions=286091 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:44Z INFO 47306 [Codegen]: Total compiler allocated DRAM tensors: 0.00290298 GB +2025-08-07T13:53:44Z INFO 47306 [Codegen]: Total un-allocated DRAM tensors by kind: +2025-08-07T13:53:44Z INFO 47306 [Codegen]: +┌────────────────┬─────────────┐ +│ TensorKind │ Size (GB) │ +├────────────────┼─���───────────┤ +│ ExternalInput │ 7.69882 │ +│ ExternalOutput │ 3.72529e-09 │ +│ Const │ 0.00222994 │ +└────────────────┴─────────────┘ + +2025-08-07T13:53:44Z INFO 47306 [Codegen]: Total runtime managed DRAM tensors: 7.70105 GB +2025-08-07T13:53:45Z INFO 47306 [Codegen]: Instruction Stats: +2025-08-07T13:53:45Z INFO 47306 [Codegen]: +┌─────────────────────┬────────┐ +│ Opcode │ Count │ +├─────────────────────┼────────┤ +│ MATMUL │ 251660 │ +│ LDWEIGHTS │ 251660 │ +│ ACTIVATE │ 12647 │ +│ EVENT_SEMAPHORE │ 8747 │ +│ UNKNOWN(0xd4) │ 8085 │ +│ TENSOR_TENSOR │ 1125 │ +│ PSEUDO_DMA_TRIGGER │ 866 │ +│ MATCH_VALUE_LOAD │ 441 │ +│ TENSOR_SCALAR_ADDR │ 345 │ +│ MEMSET │ 333 │ +│ TENSOR_SCALAR │ 332 │ +│ LOAD_MASK_SELECT │ 294 │ +│ ACT_TABLE_LOAD │ 233 │ +│ CAST │ 230 │ +│ MAX8 │ 224 │ +│ FIND_INDEX8 │ 224 │ +│ STREAM_SHUFFLE │ 222 │ +│ MATCH_REPLACE8 │ 217 │ +│ TENSOR_REDUCE │ 151 │ +│ UNKNOWN(0xda) │ 148 │ +│ GATHER │ 99 │ +│ POOL_BUFFER_LOAD │ 99 │ +│ RECIPROCAL │ 75 │ +│ UNKNOWN(0xd9) │ 75 │ +│ IOTA │ 73 │ +│ STREAM_TRANSPOSE │ 72 │ +│ COPY │ 72 │ +│ UNKNOWN(0xe8) │ 38 │ +│ PSEUDO_BRANCH_LABEL │ 5 │ +│ ALU_OP │ 2 │ +│ UNKNOWN(0xe5) │ 2 │ +│ PSEUDO_TENSOR_LOAD │ 1 │ +│ MOVE │ 1 │ +│ NOP │ 1 │ +│ RNG │ 1 │ +│ TENSOR_SCALAR │ 1 │ +└─────────────────────┴────────┘ + +2025-08-07T13:53:45Z INFO 47306 [Codegen]: +┌────────────┬────────┐ +│ Engine │ Count │ +├────────────┼────────┤ +│ Unassigned │ 0 │ +│ GPSIMD │ 13440 │ +│ Scalar │ 14578 │ +│ Tensor │ 506363 │ +│ SyncDMA │ 0 │ +│ Vector │ 4280 │ +│ Sync │ 145 │ +│ All │ 0 │ +└────────────┴────────┘ + +2025-08-07T13:53:45Z INFO 47306 [Codegen]: Total instructions: 538806 (0.0321153 GB) +2025-08-07T13:53:45Z INFO 47306 [Codegen]: Total DynamicDMA instruction count: 8085 +2025-08-07T13:53:45Z USER 47306 [Codegen]: isa_gen finished after 1.162 seconds +2025-08-07T13:53:45Z INFO 47306 [Codegen]: Number of DMA descriptors on each queue instance: +┌───────────────────┬────────────────┐ +│ Queue Instance │ RT Descriptors │ +├───────────────────┼────────────────┤ +│ qActSpillReload0 │ 5932 │ +│ qDVESpillReload0 │ 138 │ +│ qPoolIO0 │ 2 │ +│ qPoolSpillReload0 │ 7308 │ +│ qSPIO0 │ 70 │ +│ qSPSpillReload0 │ 12384 │ +└───────────────────┴────────────────┘ + +Total descriptors: 25834 (0.000384957 GB) +2025-08-07T13:53:45Z INFO 47306 [Codegen]: Number of DMA engines used by each queue: +┌───────────────────┬──────────────────────┐ +│ Queue │ DMA Engines │ +├───────────────────┼──────────────────────┤ +│ qSPIO0 │ 16 │ +│ qSPSpillReload0 │ 16 │ +│ qPoolDynamic │ 16 │ +│ qPoolSpillReload0 │ 16 │ +│ qActSpillReload0 │ 16 │ +│ qDVESpillReload0 │ 16 │ +│ qPoolIO0 │ 16 │ +├───────────────────┼──────────────────────┤ +│ TOTAL │ 112 (must be <= 176) │ +└───────────────────┴──────────────────────┘ + +2025-08-07T13:53:45Z INFO 47306 [Codegen]: Tensors with largest descriptor count: +┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬───────────────┬──────────┬──────────────────┐ +│ Tensor Name │ Kind │ Src Type │ Descriptor Count │ +├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────┼──────────┼──────────────────┤ +│ Coalesced_memloc_Coalesced_memloc_cosine.152.50071--cosine.152.50067_24--Coalesced_memloc_cosine.152.50058--cosine.152.50054_27_99 │ Internal │ float32 │ 5 │ +│ Coalesced_memloc_Coalesced_memloc_cosine.152.50175--cosine.152.50171_0--Coalesced_memloc_cosine.152.50162--cosine.152.50158_3_87 │ Internal │ float32 │ 5 │ +│ Coalesced_memloc_Coalesced_memloc_cosine.152.50149--cosine.152.50145_6--Coalesced_memloc_cosine.152.50136--cosine.152.50132_9_90 │ Internal │ float32 │ 5 │ +│ Coalesced_memloc_Coalesced_memloc_cosine.152.49993--cosine.152.49989_42--Coalesced_memloc_cosine.152.49980--cosine.152.49976_45_108 │ Internal │ float32 │ 5 │ +│ Coalesced_memloc_Coalesced_memloc_cosine.152.49889--cosine.152.49885_66--Coalesced_memloc_cosine.152.49876--cosine.152.49872_69_120 │ Internal │ float32 │ 5 │ +│ Coalesced_memloc_Coalesced_memloc_cosine.152.49863--cosine.152.49859_72--Coalesced_memloc_cosine.152.49850--cosine.152.49846_75_123 │ Internal │ float32 │ 5 │ +│ Coalesced_memloc_Coalesced_memloc_cosine.152.50045--cosine.152.50041_30--Coalesced_memloc_cosine.152.50032--cosine.152.50028_33_102 │ Internal │ float32 │ 5 │ +│ Coalesced_memloc_Coalesced_memloc_cosine.152.49915--cosine.152.49911_60--Coalesced_memloc_cosine.152.49902--cosine.152.49898_63_117 │ Internal │ float32 │ 5 │ +│ input2 │ ExternalInput │ int32 │ 31 │ +│ convert.840 │ Internal │ float32 │ 599 │ +└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────┴──────────┴──────────────────┘ + +2025-08-07T13:53:45Z USER 47306 [Codegen]: dma_desc_gen finished after 0.014 seconds +2025-08-07T13:53:45Z INFO 47306 [Codegen]: Estimated peak DRAM usage: 7.73645 GB +2025-08-07T13:53:45Z INFO 47306 [Codegen]: Generating debug info +2025-08-07T13:53:46Z WARNING 47306 [Codegen]: Found 163 instructions with more than 100 dependencies. For each such instruction, skipping writing more than 100 dependencies into the built-in NEFF debug info to prevent excessive compile time and NEFF size. For those instructions, the Neuron profiler will not display the skipped dependencies. +2025-08-07T13:53:46Z USER 47306 [Codegen]: debug_info_gen finished after 0.550 seconds +2025-08-07T13:53:46Z USER 47306 [ModuleForkPass]: codegen finished after 1.778 seconds +2025-08-07T13:53:46Z INFO 47306 [ModuleForkPass]: curr_vmrss: 2216mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:46Z INFO 47306 [ModuleForkPass]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 286091 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:46Z USER 47306 [BackendPassManager]: mod_parallel_pass finished after 1.803 seconds +2025-08-07T13:53:46Z INFO 47306 [BackendPassManager]: curr_vmrss: 2022mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:46Z INFO 47306 [BackendPassManager]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 286091 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:46Z USER 47306 [BackendPassManager]: Running neff_packager +2025-08-07T13:53:46Z INFO 47306 [BackendPassManager]: Inputs to neff_packager: modules=1 functions=1 allocs=27771 blocks=1 instructions=286091 Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:46Z WARNING 47306 [NeffFileWriter]: writeKelp missing file /local/p4clients/pkgbuild-const/workspace/build/KaenaCompiler/KaenaCompiler-2.x.169490.0/AL2_x86_64/DEV.STD.PTHREAD/build/private/_skbuild/linux-x86_64-3.10/cmake-build/neuronxcc/walrus/neff_packager/MetricMetadata.json +2025-08-07T13:53:46Z INFO 47306 [NeffFileWriter]: Neff will be written to: /home/ubuntu/qwen3/token_generation_model/_tp0_bk0/model.MODULE_6ef5ba8b41fbbe77f080+74ae8282.neff +2025-08-07T13:53:46Z INFO 47306 [NeffFileWriter]: IR signature: b10f509ebdeafba6769739af0b92c8e2 for neff artifacts +2025-08-07T13:53:46Z USER 47306 [BackendPassManager]: neff_packager finished after 0.311 seconds +2025-08-07T13:53:46Z INFO 47306 [BackendPassManager]: curr_vmrss: 2022mb, ru_maxrss: 2379mb (delta=0mb) +2025-08-07T13:53:46Z INFO 47306 [BackendPassManager]: Output has 1 module(s), 1 function(s), 27771 memory location(s), 1 block(s), and 286091 instruction(s). Max writers: 1537 Max Readers: 20035 +2025-08-07T13:53:46Z INFO 47306 [BackendDriver]: HBM scratchpad usage summary (post-allocation): +┌──────┬───────────┬────────────────────────────────────────────────────────────┬─────────────┐ +│ Core │ Subgraph │ Description │ Value │ +├──────┼───────────┼────────────────────────────────────────────────────────────┼─────────────┤ +│ nc00 │ module │ Peak scratchpad usage: local │ 0.002903 GB │ +│ nc00 │ module │ Total size of allocated tensors: local │ 0.003231 GB │ +│ nc00 │ Max │ Peak scratchpad usage: local │ 0.002903 GB │ +│ nc00 │ Post-link │ Peak scratchpad usage after intermediate tensor allocation │ 0.000000 GB │ +│ nc00 │ Post-link │ Total size of allocated intermediate tensors │ 0.000000 GB │ +├──────┼───────────┼────────────────────────────────────────────────────────────┼─────────────┤ +│ Max │ Max │ Peak scratchpad usage │ 0.002903 GB │ +│ Max │ Max │ Peak scratchpad usage (page-aligned) │ 0.500000 GB │ +└──────┴───────────┴────────────────────────────────────────────────────────────┴─────────────┘ + +2025-08-07T13:53:46Z INFO 47306 [BackendDriver]: Backend completed successfully, tearing down. +2025-08-07T13:53:47Z INFO 47058 [job.WalrusDriver.0]: Job #0 finished +2025-08-07T13:53:47Z INFO 47058 [pipeline.Pipeline.0]: Finished job job.WalrusDriver.0 +2025-08-07T13:53:47Z INFO 47058 [pipeline.Pipeline.0]: Starting job job.BIRLinker.0 +2025-08-07T13:53:47Z INFO 47058 [job.BIRLinker.0]: Replay this job by calling: /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/bin/neuronx-cc compile --framework XLA --state '{"model": ["/home/ubuntu/qwen3/token_generation_model/_tp0_bk0/model.MODULE_6ef5ba8b41fbbe77f080+74ae8282.hlo_module.pb"], "tensormap": "tensor_map.json", "bir": "bir.json", "lorean_sg_key": null, "input_name_map": null, "output_name_map": null, "constant_tensors": null, "state_dir": "/home/ubuntu/qwen3/token_generation_model/_tp0_bk0/neuronxcc-ykq_7n9z/sg00", "state_id": "sg00"}' --pipeline BIRLinker +2025-08-07T13:53:47Z INFO 47058 [job.BIRLinker.0]: BIRLinker cwd: /home/ubuntu/qwen3/token_generation_model/_tp0_bk0/neuronxcc-ykq_7n9z +2025-08-07T13:53:47Z INFO 47058 [job.BIRLinker.0]: Linking not needed. Netlist doesnt exist +2025-08-07T13:53:47Z INFO 47058 [pipeline.Pipeline.0]: Finished job job.BIRLinker.0 +2025-08-07T13:53:47Z INFO 47058 [pipeline.Pipeline.0]: Starting job job.Kelper.0 +2025-08-07T13:53:47Z INFO 47058 [job.Kelper.0]: Skipping neff generation which was already performed by neff_packager +2025-08-07T13:53:47Z INFO 47058 [pipeline.Pipeline.0]: Finished job job.Kelper.0 +2025-08-07T13:53:47Z INFO 47058 [pipeline.Pipeline.0]: Starting job job.NeffWrapper.0 +2025-08-07T13:53:47Z INFO 47058 [job.NeffWrapper.0]: Job NeffWrapper len(in_states) 1 +2025-08-07T13:53:47Z INFO 47058 [job.NeffWrapper.0]: Processing input #0 +2025-08-07T13:53:47Z INFO 47058 [job.NeffWrapper.0]: Start NeffWrapper +2025-08-07T13:53:47Z INFO 47058 [job.NeffWrapper.0]: Executing: /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/starfish/bin/hlo-neff-wrapper --hlo /home/ubuntu/qwen3/token_generation_model/_tp0_bk0/model.MODULE_6ef5ba8b41fbbe77f080+74ae8282.hlo_module.pb --neff /home/ubuntu/qwen3/token_generation_model/_tp0_bk0/model.MODULE_6ef5ba8b41fbbe77f080+74ae8282.neff --io_transposes /home/ubuntu/qwen3/token_generation_model/_tp0_bk0/neuronxcc-ykq_7n9z/io_transposes.json --output /home/ubuntu/qwen3/token_generation_model/_tp0_bk0/wrapped_neff.hlo --netlist /home/ubuntu/qwen3/token_generation_model/_tp0_bk0/neuronxcc-ykq_7n9z/hlo_netlist.json +2025-08-07T13:53:48Z INFO 47058 [job.NeffWrapper.0]: Could not open file: /home/ubuntu/qwen3/token_generation_model/_tp0_bk0/neuronxcc-ykq_7n9z/hlo_netlist.json +Hlo neff wrapper finished successfully. Have a wonderful day :D + +2025-08-07T13:53:48Z INFO 47058 [job.NeffWrapper.0]: Job #0 finished +2025-08-07T13:53:48Z INFO 47058 [pipeline.Pipeline.0]: Finished job job.NeffWrapper.0 +2025-08-07T13:53:48Z INFO 47058 [pipeline.Pipeline.0]: Finished pipeline Pipeline +2025-08-07T13:53:48Z INFO 47058 [pipeline.Pipeline.0]: Job #0 finished +2025-08-07T13:53:48Z INFO 46994 [root]: Subcommand returned with exitcode=0