Chroma1-HD

Chroma1-HD is an 8.9B parameter text-to-image foundational model based on FLUX.1-schnell. It is fully Apache 2.0 licensed, ensuring that anyone can use, modify, and build upon it.

As a base model, Chroma1 is intentionally designed to be an excellent starting point for finetuning. It provides a strong, neutral foundation for developers, researchers, and artists to create specialized models.

for the fast CFG "baked" version please go to Chroma1-Flash.

Key Features

High-Performance Base: 8.9B parameters, built on the powerful FLUX.1 architecture.
Easily Finetunable: Designed as an ideal checkpoint for creating custom, specialized models.
Community-Driven & Open-Source: Fully transparent with an Apache 2.0 license, and training history.
Flexible by Design: Provides a flexible foundation for a wide range of generative tasks.

Special Thanks

A massive thank you to our supporters who make this project possible.

Anonymous donor whose incredible generosity funded the pretraining run and data collections. Your support has been transformative for open-source AI.
Fictional.ai for their fantastic support and for helping push the boundaries of open-source AI. You can try Chroma on their platform:

How to Use

`diffusers` Library

install the requirements

pip install transformers diffusers sentencepiece accelerate

import torch
from diffusers import ChromaPipeline

pipe = ChromaPipeline.from_pretrained("lodestones/Chroma1-HD", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload()

prompt = [
    "A high-fashion close-up portrait of a blonde woman in clear sunglasses. The image uses a bold teal and red color split for dramatic lighting. The background is a simple teal-green. The photo is sharp and well-composed, and is designed for viewing with anaglyph 3D glasses for optimal effect. It looks professionally done."
]
negative_prompt =  ["low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors"]

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    generator=torch.Generator("cpu").manual_seed(433),
    num_inference_steps=40,
    guidance_scale=3.0,
    num_images_per_prompt=1,
).images[0]
image.save("chroma.png")

Quantized inference using gemlite

import torch
from diffusers import ChromaPipeline

pipe = ChromaPipeline.from_pretrained("lodestones/Chroma1-HD", torch_dtype=torch.float16)
#pipe.enable_model_cpu_offload()

#######################################################
import gemlite
device = 'cuda:0'
processor = gemlite.helper.A8W8_int8_dynamic
#processor = gemlite.helper.A8W8_fp8_dynamic
#processor = gemlite.helper.A16W4_MXFP

for name, module in pipe.transformer.named_modules():
    module.name = name

def patch_linearlayers(model, fct):
    for name, layer in model.named_children():
        if isinstance(layer, torch.nn.Linear):
            setattr(model, name, fct(layer, name))
        else:
            patch_linearlayers(layer, fct)

def patch_linear_to_gemlite(layer, name):
    layer = layer.to(device, non_blocking=True)
    try:
        return processor(device=device).from_linear(layer) 
    except Exception as exception:
        print('Skipping gemlite conversion for: ' + str(layer.name), exception)
        return layer

patch_linearlayers(pipe.transformer, patch_linear_to_gemlite)
torch.cuda.synchronize()
torch.cuda.empty_cache()

pipe.to(device)
pipe.transformer.forward = torch.compile(pipe.transformer.forward, fullgraph=True)
pipe.vae.forward = torch.compile(pipe.vae.forward, fullgraph=True)
#pipe.set_progress_bar_config(disable=True)
#######################################################

prompt = [
    "A high-fashion close-up portrait of a blonde woman in clear sunglasses. The image uses a bold teal and red color split for dramatic lighting. The background is a simple teal-green. The photo is sharp and well-composed, and is designed for viewing with anaglyph 3D glasses for optimal effect. It looks professionally done."
]
negative_prompt =  ["low quality, ugly, unfinished, out of focus, deformed, disfigure, blurry, smudged, restricted palette, flat colors"]

import time
for _ in range(3):
    t_start = time.time()
    image = pipe(
        prompt=prompt,
        negative_prompt=negative_prompt,
        generator=torch.Generator("cpu").manual_seed(433),
        num_inference_steps=40,
        guidance_scale=3.0,
        num_images_per_prompt=1,
    ).images[0]
    t_end = time.time()
    print(f"Took: {t_end - t_start} secs.") #66.1242527961731 -> 27.72 sec

image.save("chroma.png")

ComfyUI For advanced users and customized workflows, you can use Chroma with ComfyUI.

Requirements:

A working ComfyUI installation.
Chroma checkpoint (latest version).
T5 XXL Text Encoder.
FLUX VAE.
Chroma Workflow JSON.

Setup:

Place the T5_xxl model in your ComfyUI/models/clip folder.
Place the FLUX VAE in your ComfyUI/models/vae folder.
Place the Chroma checkpoint in your ComfyUI/models/diffusion_models folder.
Load the Chroma workflow file into ComfyUI and run.

Model Details

Architecture: Based on the 8.9B parameter FLUX.1-schnell model.
Training Data: Trained on a 5M sample dataset curated from a 20M pool, including artistic, photographic, and niche styles.
Technical Report: A comprehensive technical paper detailing the architectural modifications and training process is forthcoming.

Intended Use

Chroma is intended to be used as a base model for researchers and developers to build upon. It is ideal for:

Finetuning on specific styles, concepts, or characters.
Research into generative model behavior, alignment, and safety.
As a foundational component in larger AI systems.

Limitations and Bias Statement

Chroma is trained on a broad, filtered dataset from the internet. As such, it may reflect the biases and stereotypes present in its training data. The model is released in a state as is and has not been aligned with a specific safety filter.

Users are responsible for their own use of this model. It has the potential to generate content that may be considered harmful, explicit, or offensive. I encourage developers to implement appropriate safeguards and ethical considerations in their downstream applications.

Summary of Architectural Modifications

(For a full breakdown, tech report soon-ish.)

12B → 8.9B Parameters:
- TL;DR: I replaced a 3.3B parameter timestep-encoding layer with a more efficient 250M parameter FFN, as the original was vastly oversized for its task.
MMDiT Masking:
- TL;DR: Masking T5 padding tokens enhanced fidelity and increased training stability by preventing the model from focusing on irrelevant <pad> tokens.
Custom Timestep Distributions:
- TL;DR: I implemented a custom timestep sampling distribution (-x^2) to prevent loss spikes and ensure the model trains effectively on both high-noise and low-noise regions.

P.S

Chroma1-HD is not the old Chroma-v.50 it has been retrained from v.48

Citation

@misc{rock2025chroma,
  author = {Lodestone Rock},
  title = {Chroma1-HD},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face repository},
  howpublished = {\url{https://huggingface.co/lodestones/Chroma1-HD}},
}