SD1.5 experiments with Huber and MSE loss. All models trained for 4 epochs on approximately 250k images from a variety of sources. Approximately half from LAION Aesthetics, and a few thousand 4K video rips with COG-VLM captions.
Trained using Everydream2 Trainer (https://github.com/victorchall/EveryDream2trainer) on an RTX 6000 Ada 48gb. Each epoch takes approximately 10 hours for a total of about 40 hours per model.
- Multi-aspect ratio trained with nominal size of <=768^2 pixels for each bucket
- Batch size 12 with grad accum 10.
- AdamW 8bit optimizer with standard betas of (0.9,0.999) and weight decay of 0.010.
- Automatic mixed precision FP16 (note: grad scalar val was surprisingly identical on all runs)
- TF32 matmul and SDP Attention
- 3.0e-6 LR cosine schedule with a ~12 epoch target to decay, ending around 2.3e-6 at end of training
- Pyramid noise using discount 0.03
- Zero offset noise of 0.02
- Min SNR gamma of 5.0
- Unet only training, text encoder left frozen.
- Conditional dropout of 10%
The following models were produced:
- 768_huber.safetensors - Huber loss only
- 768_mse_plus_huberd1.5.safetensors - MSE Plus Huber (d=1.5) loss
- 768_ts0huber_ts999mse.safetensors - Huber loss at timestep 0 interpolated to MSE loss at timestep 999
- 768_ts0mse_ts999huber.safetensors - MSE loss at timestep 0 interpolated to Huber loss at timestep 999
Worth noting timestep 0 is the lowest-noise-added step and 999 is most noised timestep
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.