OPPOer
/

Qwen-Image-Pruning

@@ -14,12 +14,16 @@ pipeline_tag: text-to-image
 </div>
 ## Introduction
-This open-source project is based on Qwen-Image and has attempted model pruning, removing 20 layers while retaining the weights of 40 layers, resulting in a model size of 13.3B parameters. The pruned model has experienced a slight drop in objective metrics. The pruned version will continue to be iterated upon. Additionally, the pruned version supports the adaptation and loading of community models such as LoRA and ControlNet. Please stay tuned. For the relevant inference scripts, please refer to https://github.com/OPPO-Mente-Lab/Qwen-Image-Pruning.
 <div align="center">
   <img src="bench.png">
 </div>
 ## Quick Start
 Install the latest version of diffusers and pytorch
@@ -33,32 +37,26 @@ pip install git+https://github.com/huggingface/diffusers
 import torch
 import os
 from diffusers import DiffusionPipeline
 model_name = "OPPOer/Qwen-Image-Pruning"
 if torch.cuda.is_available():
     torch_dtype = torch.bfloat16
     device = "cuda"
 else:
     torch_dtype = torch.bfloat16
     device = "cpu"
 pipe = DiffusionPipeline.from_pretrained(model_name, torch_dtype=torch_dtype)
 pipe = pipe.to(device)
 # Generate image
 positive_magic = {"en": ", Ultra HD, 4K, cinematic composition.", # for english prompt,
 "zh": "，超清，4K，电影级构图。" # for chinese prompt,
 }
 negative_prompt = " "
 prompts = [
     '一个穿着"QWEN"标志的T恤的中国美女正拿着黑色的马克笔面相镜头微笑。她身后的玻璃板上手写体写着 "一、Qwen-Image的技术路线： 探索视觉生成基础模型的极限，开创理解与生成一体化的未来。二、Qwen-Image的模型特色：1、复杂文字渲染。支持中英渲染、自动布局； 2、精准图像编辑。支持文字编辑、物体增减、风格变换。三、Qwen-Image的未来愿景：赋能专业内容创作、助力生成式AI发展。"',
     '海报，温馨家庭场景，柔和阳光洒在野餐布上，色彩温暖明亮，主色调为浅黄、米白与淡绿，点缀着鲜艳的水果和野花，营造轻松愉快的氛围，画面简洁而富有层次，充满生活气息，传达家庭团聚与自然和谐的主题。文字内容：“共享阳光，共享爱。全家一起野餐，享受美好时光。让每一刻都充满欢笑与温暖。”',
     '一个穿着校服的年轻女孩站在教室里，在黑板上写字。黑板中央用整洁的白粉笔写着“Introducing Qwen-Image, a foundational image generation model that excels in complex text rendering and precise image editing”。柔和的自然光线透过窗户，投下温柔的阴影。场景以写实的摄影风格呈现，细节精细，景深浅，色调温暖。女孩专注的表情和空气中的粉笔灰增添了动感。背景元素包括课桌和教育海报，略微模糊以突出中心动作。超精细32K分辨率，单反质量，柔和的散景效果，纪录片式的构图。',
     '一个台球桌上放着两排台球，每排5个，第一行的台球上面分别写着"Qwen""Image" "将 "于" "8" ，第二排台球上面分别写着"月" "正" "式" "发" "布" 。',
 ]
 output_dir = 'examples_Pruning'
 os.makedirs(output_dir, exist_ok=True)
 for prompt in prompts:
@@ -80,34 +78,28 @@ for prompt in prompts:
 import torch
 import os
 from diffusers import DiffusionPipeline
 model_name = "OPPOer/Qwen-Image-Pruning"
 lora_name = 'flymy_realism.safetensors'
 if torch.cuda.is_available():
     torch_dtype = torch.bfloat16
     device = "cuda"
 else:
     torch_dtype = torch.bfloat16
     device = "cpu"
 pipe = DiffusionPipeline.from_pretrained(model_name, torch_dtype=torch_dtype)
 pipe = pipe.to(device)
 pipe.load_lora_weights(lora_name, adapter_name="lora")
 # Generate image
 positive_magic = {"en": ", Ultra HD, 4K, cinematic composition.", # for english prompt,
 "zh": "，超清，4K，电影级构图。" # for chinese prompt,
 }
 negative_prompt = " "
 prompts = [
     '一个穿着"QWEN"标志的T恤的中国美女正拿着黑色的马克笔面相镜头微笑。她身后的玻璃板上手写体写着 "一、Qwen-Image的技术路线： 探索视觉生成基础��型的极限，开创理解与生成一体化的未来。二、Qwen-Image的模型特色：1、复杂文字渲染。支持中英渲染、自动布局； 2、精准图像编辑。支持文字编辑、物体增减、风格变换。三、Qwen-Image的未来愿景：赋能专业内容创作、助力生成式AI发展。"',
     '海报，温馨家庭场景，柔和阳光洒在野餐布上，色彩温暖明亮，主色调为浅黄、米白与淡绿，点缀着鲜艳的水果和野花，营造轻松愉快的氛围，画面简洁而富有层次，充满生活气息，传达家庭团聚与自然和谐的主题。文字内容：“共享阳光，共享爱。全家一起野餐，享受美好时光。让每一刻都充满欢笑与温暖。”',
     '一个穿着校服的年轻女孩站在教室里，在黑板上写字。黑板中央用整洁的白粉笔写着“Introducing Qwen-Image, a foundational image generation model that excels in complex text rendering and precise image editing”。柔和的自然光线透过窗户，投下温柔的阴影。场景以写实的摄影风格呈现，细节精细，景深浅，色调温暖。女孩专注的表情和空气中的粉笔灰增添了动感。背景元素包括课桌和教育海报，略微模糊以突出中心动作。超精细32K分辨率，单反质量，柔和的散景效果，纪录片式的构图。',
     '一个台球桌上放着两排台球，每排5个，第一行的台球上面分别写着"Qwen""Image" "将 "于" "8" ，第二排台球上面分别写着"月" "正" "式" "发" "布" 。',
 ]
 output_dir = 'examples_Pruning+Realism_LoRA'
 os.makedirs(output_dir, exist_ok=True)
 for prompt in prompts:
@@ -128,16 +120,12 @@ for prompt in prompts:
 ```python
 import os
 import glob
 import torch
 from diffusers import DiffusionPipeline
 from diffusers.utils import load_image
 from diffusers import QwenImageControlNetPipeline, QwenImageControlNetModel
 model_name = "OPPOer/Qwen-Image-Pruning"
 controlnet_name = "InstantX/Qwen-Image-ControlNet-Union"
 # Load the pipeline
 if torch.cuda.is_available():
     torch_dtype = torch.bfloat16
@@ -145,14 +133,11 @@ if torch.cuda.is_available():
 else:
     torch_dtype = torch.bfloat16
     device = "cpu"
 controlnet = QwenImageControlNetModel.from_pretrained(controlnet_name, torch_dtype=torch.bfloat16)
 pipe = QwenImageControlNetPipeline.from_pretrained(
     model_name, controlnet=controlnet, torch_dtype=torch.bfloat16
 )
 pipe = pipe.to(device)
 # Generate image
 prompt_dict = {
     "soft_edge.png": "Photograph of a young man with light brown hair jumping mid-air off a large, reddish-brown rock. He's wearing a navy blue sweater, light blue shirt, gray pants, and brown shoes. His arms are outstretched, and he has a slight smile on his face. The background features a cloudy sky and a distant, leafless tree line. The grass around the rock is patchy.",
@@ -161,10 +146,8 @@ prompt_dict = {
     "pose.png": "Photograph of a young man with light brown hair and a beard, wearing a beige flat cap, black leather jacket, gray shirt, brown pants, and white sneakers. He's sitting on a concrete ledge in front of a large circular window, with a cityscape reflected in the glass. The wall is cream-colored, and the sky is clear blue. His shadow is cast on the wall.",
 }
 controlnet_conditioning_scale = 1.0
 output_dir = f'examples_Pruning+ControlNet'
 os.makedirs(output_dir, exist_ok=True)
 for path in glob.glob('conds/*'):
     control_image = load_image(path)
     image_name = path.split('/')[-1]

 </div>
 ## Introduction
+This open-source project is based on Qwen-Image and has attempted model pruning, removing 20 layers while retaining the weights of 40 layers, resulting in a model size of 13.3B parameters. The pruned model has experienced a slight drop in objective metrics. The pruned version will continue to be iterated upon. Additionally, the pruned version supports the adaptation and loading of community models such as LoRA and ControlNet. Please stay tuned. For the relevant inference scripts, please refer to **[Qwen-Image-13.3B](https://github.com/OPPO-Mente-Lab/Qwen-Image-Pruning)**.
 <div align="center">
   <img src="bench.png">
 </div>
+## Update
+- 2025/09/24: We release an open-source pruned 12B model **[Qwen-Image-12B](https://huggingface.co/OPPOer/Qwen-Image-12B)**. Its performance is comparable to the previous version that pruned 20 layers of the 13.3B model, both subjectively and objectively. We will continue to optimize its performance going forward.
 ## Quick Start
 Install the latest version of diffusers and pytorch
 import torch
 import os
 from diffusers import DiffusionPipeline
 model_name = "OPPOer/Qwen-Image-Pruning"
 if torch.cuda.is_available():
     torch_dtype = torch.bfloat16
     device = "cuda"
 else:
     torch_dtype = torch.bfloat16
     device = "cpu"
 pipe = DiffusionPipeline.from_pretrained(model_name, torch_dtype=torch_dtype)
 pipe = pipe.to(device)
 # Generate image
 positive_magic = {"en": ", Ultra HD, 4K, cinematic composition.", # for english prompt,
 "zh": "，超清，4K，电影级构图。" # for chinese prompt,
 }
 negative_prompt = " "
 prompts = [
     '一个穿着"QWEN"标志的T恤的中国美女正拿着黑色的马克笔面相镜头微笑。她身后的玻璃板上手写体写着 "一、Qwen-Image的技术路线： 探索视觉生成基础模型的极限，开创理解与生成一体化的未来。二、Qwen-Image的模型特色：1、复杂文字渲染。支持中英渲染、自动布局； 2、精准图像编辑。支持文字编辑、物体增减、风格变换。三、Qwen-Image的未来愿景：赋能专业内容创作、助力生成式AI发展。"',
     '海报，温馨家庭场景，柔和阳光洒在野餐布上，色彩温暖明亮，主色调为浅黄、米白与淡绿，点缀着鲜艳的水果和野花，营造轻松愉快的氛围，画面简洁而富有层次，充满生活气息，传达家庭团聚与自然和谐的主题。文字内容：“共享阳光，共享爱。全家一起野餐，享受美好时光。让每一刻都充满欢笑与温暖。”',
     '一个穿着校服的年轻女孩站在教室里，在黑板上写字。黑板中央用整洁的白粉笔写着“Introducing Qwen-Image, a foundational image generation model that excels in complex text rendering and precise image editing”。柔和的自然光线透过窗户，投下温柔的阴影。场景以写实的摄影风格呈现，细节精细，景深浅，色调温暖。女孩专注的表情和空气中的粉笔灰增添了动感。背景元素包括课桌和教育海报，略微模糊以突出中心动作。超精细32K分辨率，单反质量，柔和的散景效果，纪录片式的构图。',
     '一个台球桌上放着两排台球，每排5个，第一行的台球上面分别写着"Qwen""Image" "将 "于" "8" ，第二排台球上面分别写着"月" "正" "式" "发" "布" 。',
 ]
 output_dir = 'examples_Pruning'
 os.makedirs(output_dir, exist_ok=True)
 for prompt in prompts:
 import torch
 import os
 from diffusers import DiffusionPipeline
 model_name = "OPPOer/Qwen-Image-Pruning"
 lora_name = 'flymy_realism.safetensors'
 if torch.cuda.is_available():
     torch_dtype = torch.bfloat16
     device = "cuda"
 else:
     torch_dtype = torch.bfloat16
     device = "cpu"
 pipe = DiffusionPipeline.from_pretrained(model_name, torch_dtype=torch_dtype)
 pipe = pipe.to(device)
 pipe.load_lora_weights(lora_name, adapter_name="lora")
 # Generate image
 positive_magic = {"en": ", Ultra HD, 4K, cinematic composition.", # for english prompt,
 "zh": "，超清，4K，电影级构图。" # for chinese prompt,
 }
 negative_prompt = " "
 prompts = [
     '一个穿着"QWEN"标志的T恤的中国美女正拿着黑色的马克笔面相镜头微笑。她身后的玻璃板上手写体写着 "一、Qwen-Image的技术路线： 探索视觉生成基础��型的极限，开创理解与生成一体化的未来。二、Qwen-Image的模型特色：1、复杂文字渲染。支持中英渲染、自动布局； 2、精准图像编辑。支持文字编辑、物体增减、风格变换。三、Qwen-Image的未来愿景：赋能专业内容创作、助力生成式AI发展。"',
     '海报，温馨家庭场景，柔和阳光洒在野餐布上，色彩温暖明亮，主色调为浅黄、米白与淡绿，点缀着鲜艳的水果和野花，营造轻松愉快的氛围，画面简洁而富有层次，充满生活气息，传达家庭团聚与自然和谐的主题。文字内容：“共享阳光，共享爱。全家一起野餐，享受美好时光。让每一刻都充满欢笑与温暖。”',
     '一个穿着校服的年轻女孩站在教室里，在黑板上写字。黑板中央用整洁的白粉笔写着“Introducing Qwen-Image, a foundational image generation model that excels in complex text rendering and precise image editing”。柔和的自然光线透过窗户，投下温柔的阴影。场景以写实的摄影风格呈现，细节精细，景深浅，色调温暖。女孩专注的表情和空气中的粉笔灰增添了动感。背景元素包括课桌和教育海报，略微模糊以突出中心动作。超精细32K分辨率，单反质量，柔和的散景效果，纪录片式的构图。',
     '一个台球桌上放着两排台球，每排5个，第一行的台球上面分别写着"Qwen""Image" "将 "于" "8" ，第二排台球上面分别写着"月" "正" "式" "发" "布" 。',
 ]
 output_dir = 'examples_Pruning+Realism_LoRA'
 os.makedirs(output_dir, exist_ok=True)
 for prompt in prompts:
 ```python
 import os
 import glob
 import torch
 from diffusers import DiffusionPipeline
 from diffusers.utils import load_image
 from diffusers import QwenImageControlNetPipeline, QwenImageControlNetModel
 model_name = "OPPOer/Qwen-Image-Pruning"
 controlnet_name = "InstantX/Qwen-Image-ControlNet-Union"
 # Load the pipeline
 if torch.cuda.is_available():
     torch_dtype = torch.bfloat16
 else:
     torch_dtype = torch.bfloat16
     device = "cpu"
 controlnet = QwenImageControlNetModel.from_pretrained(controlnet_name, torch_dtype=torch.bfloat16)
 pipe = QwenImageControlNetPipeline.from_pretrained(
     model_name, controlnet=controlnet, torch_dtype=torch.bfloat16
 )
 pipe = pipe.to(device)
 # Generate image
 prompt_dict = {
     "soft_edge.png": "Photograph of a young man with light brown hair jumping mid-air off a large, reddish-brown rock. He's wearing a navy blue sweater, light blue shirt, gray pants, and brown shoes. His arms are outstretched, and he has a slight smile on his face. The background features a cloudy sky and a distant, leafless tree line. The grass around the rock is patchy.",
     "pose.png": "Photograph of a young man with light brown hair and a beard, wearing a beige flat cap, black leather jacket, gray shirt, brown pants, and white sneakers. He's sitting on a concrete ledge in front of a large circular window, with a cityscape reflected in the glass. The wall is cream-colored, and the sky is clear blue. His shadow is cast on the wall.",
 }
 controlnet_conditioning_scale = 1.0
 output_dir = f'examples_Pruning+ControlNet'
 os.makedirs(output_dir, exist_ok=True)
 for path in glob.glob('conds/*'):
     control_image = load_image(path)
     image_name = path.split('/')[-1]