Kijai/pyramid-flow-comfy · I got this message below

##Simple Fix - Height: 832###

Actual Grok Reply,

Let’s dive deeper into your workflow to resolve the error in the PyramidFlowSampler node. The error message indicates a shape mismatch: "can't divide axis of length 23 in chunks of 2," and the tensor shape is [2, 1, 23, 40, 16]. This tensor is being processed in a rearrange operation with the pattern b t (h p1) (w p2) c -> b (t h w) (p1 p2 c), where p1 = 2 and p2 = 2. Let’s break this down together.

First, let’s interpret the tensor shape [2, 1, 23, 40, 16] in the context of the rearrange pattern:

b (batch size) = 2
t (time or frames) = 1
h p1 (height dimension) = 23, with p1 = 2
w p2 (width dimension) = 40, with p2 = 2
c (channels) = 16

The error occurs because the operation expects to split the height dimension (h p1 = 23) into chunks of p1 = 2. This means h * p1 = 23, so h should be 23 / 2 = 11.5. Since h must be an integer, 23 is not divisible by 2, causing the shape mismatch. Similarly, the width dimension (w p2 = 40) should be divisible by p2 = 2, which it is (40 / 2 = 20).

Now, let’s trace back to where this tensor is coming from. The PyramidFlowSampler (Node 50) has an input_latent coming from the PyramidFlowVAEEncode (Node 54). The PyramidFlowVAEEncode takes an image from the ImageScale node (Node 57), which resizes the input image to 1280x768 pixels. The PyramidFlowVAEEncode then encodes this image into a latent representation.

Here’s the key question: when the image of size 1280x768 is encoded into the latent space, how does the VAE transform the dimensions? Typically, a VAE downscales the spatial dimensions by a factor related to its architecture (often a power of 2, like 8 or 16). Let’s assume a downscaling factor of 8 (a common choice in many VAEs):

Height: 768 / 8 = 96
Width: 1280 / 8 = 160

So, the latent shape might be something like [batch, channels, 96, 160]. However, the error shows a shape of [2, 1, 23, 40, 16]. Notice that:

96 / 4 = 24 (close to 23)
160 / 4 = 40

This suggests there might be an additional downscaling or reshaping happening, possibly within the PyramidFlowVAEEncode or the sampler itself. The PyramidFlowVAEEncode has a widget value overlap_factor = 0.25, which might affect how the image is processed, but the main issue seems to be the height dimension (23) not being divisible by 2.

Let’s think about the image dimensions. The PyramidFlowSampler also takes width and height inputs (1280 and 768), which match the ImageScale node. However, the latent tensor’s spatial dimensions (23 and 40) suggest that the VAE encoding process has produced a shape that doesn’t align with the sampler’s expectations. The sampler expects the height and width in the latent space to be divisible by the patch size (p1 = 2, p2 = 2).

Here’s a hypothesis: the VAE encoding process might be producing a latent height of 23 due to the specific downscaling and padding behavior of the PyramidFlowVAEEncode. To fix this, we need the latent dimensions to be divisible by 2. Since 23 is odd, we need to adjust the input image dimensions so that, after downscaling, the latent height and width are even numbers.

Let’s calculate what the input image dimensions should be:

Assume the VAE downscales by a factor of 32 (since 768 / 32 = 24, and 1280 / 32 = 40, but we’re seeing 23 instead of 24 for height, possibly due to padding or cropping).
For the latent height to be divisible by 2, the original height should be such that height / 32 is even. For example:
- If height = 768, then 768 / 32 = 24 (even, but we’re getting 23—possibly due to how the VAE handles boundaries).
- If height = 832, then 832 / 32 = 26 (even).
For the width: 1280 / 32 = 40, which is already even.

So, the issue might be with the height. Let’s try adjusting the height to 832 (the next multiple of 32 that ensures an even latent dimension). This would make the latent height 26 (832 / 32), which is divisible by 2.

What do you think about adjusting the height in the ImageScale node (Node 57) and the PyramidFlowSampler node (Node 50) to 832 instead of 768? You can do this by changing the height value in the PrimitiveNode (Node 59) from 768 to 832. Does that sound like a reasonable step to try? What might be the implications of changing the image dimensions like this?

https://grok.com/share/bGVnYWN5_701c5593-7224-471a-b05b-1b589e6e71db