README.md · duongve/NetaYume-Lumina-Image-2.0-Diffusers at main

metadata

pipeline_tag: text-to-image
library_name: diffusers
license: apache-2.0
base_model:
  - neta-art/Neta-Lumina
  - Alpha-VLLM/Lumina-Image-2.0

1. Usage

import torch
from diffusers import Lumina2Pipeline

pipe = Lumina2Pipeline.from_pretrained("duongve/NetaYume-Lumina-Image-2.0-Diffusers", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power

prompt = "kita ikuyo (Bocchi the Rock!), 1girl, anime style, vibrant colors, red hair, medium hair with one side up, green eyes, bangs, hair between eyes, school uniform (white shirt, grey serafuku sailor collar, red neckerchief, pleated skirt), sitting upper body close-up, holding bouquet with white lily & pink flowers, indoors with depth of field, cherry blossom-like light particles, soft sunlight backlighting, bloom, chromatic aberration & lens flare abuse, light smile, closed mouth, one side hair up, transparent blurry foreground, warm cozy atmosphere, masterpiece, best quality"
image = pipe(
    prompt,
    height=1536,
    width=1024,
    guidance_scale=4.0,
    num_inference_steps=50,
    cfg_trunc_ratio=6,
    cfg_normalization=False, #Important
    generator=torch.Generator("cuda").manual_seed(0),
    system_prompt="You are an assistant designed to generate anime images based on textual prompts.",

).images[0]
image.save("luminayume_demo.png")

2. Suggestion

System Prompt: This help you generate your desired images more easily by understanding and aligning with your prompts.

For anime-style images using Danbooru tags:

 You are an advanced assistant designed to generate high-quality images from user prompts, utilizing danbooru tags to accurately guide the image creation process. 

 You are an assistant designed to generate high-quality images based on user prompts and  danbooru tags.

For general use:

 You are an assistant designed to generate superior images with the superior degree of image-text alignment based on textual prompts or user prompts.

 You are an assistant designed to generate high-quality images with the highest degree of image-text alignment based on textual prompts.

Recommended Settings

CFG: 4–7
Sampling Steps: 40-50
Sampler:
- Euler a (with scheduler: normal)
- res_multistep (with scheduler: linear_quadratic)

3. Acknowledgments

narugo1992 – for the invaluable Danbooru dataset
Alpha-VLLM - for creating the a wonderful model!
Neta.art and his team – for openly sharing awesome model.