Per-subject tags grouping.
Is it possible to isolate tags per-subject to avoid "bleeding" onto another? It doesn't always work with textual descriptions. For example:
2d videogame scene with A on the foreground to the left and B in the background to the right.
(A is a mature woman kusanagi motoko, she wears denim shorts and white parka.)
(B is a muscular man in sporty underwear, showing v sign.)
It works to some degree, but I'm sure it's because of the text encoder being T5 (flan in this case). Is there a more official way to group tags?
Have tried a similar approach as you and have some bleeding issues still. My prompts are more dynamic than yours so you could look into regional prompting or regional conditioning to separate the two subjects more.
Have a reddit post running ( https://www.reddit.com/r/StableDiffusion/comments/1m6lz4h/ways_to_separate_different_persons_in_chroma_just/ ) but so far not much help on the pure prompt side, there are a lot of "do postwork and inpaint or fix in image editing"...
I also added an extra line break between the subjects but also did not help much...
It works to some degree, but I'm sure it's because of the text encoder being T5 (flan in this case). Is there a more official way to group tags?
Haven't seen an official / working way to do it. I test each version with the previous, with a given set of prompts / seeds, and in particular I have one dedicated to test this aspect (can't post as it is NSFW) : 2 women + in POV. What I can say:
- I couldn't get positioning (left / right) to work fine, but 90-95% of the time the 1st character will be on the left
- one character behind the other works pretty well for me
- mutual interaction between characters (kissing, holding hands, ...) mostly works
- interaction of one character to the other doesn't work that fine. But it's better when one character interacts with the viewer
- giving names didn't really work for me, even when using non-existing words (like only consonants and no vowel)
- I resorted to indicate ethnicity, but I guess hair color or so may be right. If you have a man and a woman like in your example, it should work even better
- I write my prompt in multiple sentences, each on a new line. I found it better if I group all stuff for a character in one sentence, or 2 sentences separated by
:
. Example:One woman is Asian: she is <doing stuff>, <describe hair style / color, clothing, ...>.
I still get character bleeding sometimes, but it also depends on Chroma's version; for example, with some of my problematic seeds, some character bleeding disappeared in v45-dc
and v46-dc
, but reappeared in v47-dc
. But generally speaking, things are going in the right direction, and I'm eager to see how Chroma stabilizes in v48-v50.
Haven't seen an official / working way to do it. I test each version with the previous, with a given set of prompts / seeds, and in particular I have one dedicated to test this aspect (can't post as it is NSFW) : 2 women + in POV.
POV did work quite nicely in the last test I did (somewhere around 41 or 43)
The character interacting seems to be still an issue but if describing positioning better and character in relations, it seems better as well, recently removed "positions" and described where each character is and what pose they are in (and that in a random prompt to get not the same pose on every image).
Yeah, also noticed a slight step back in v47 but had a similar feeling from 45 to 46 but then noticed that other things worked a lot better like the training data reached a set of concepts that was not quite there before, so continue testing and prompting to get the best results...
As for the names, I use stuff like person1 interacts with person2 and then a few lines down, say person1 is ... and then two lines more and person2 is ...
Might try all three approaches later, namedChar in brackets with colon... will see...
The concept bleed is caused by the weird pseudo-logic language that you invented that the T5 doesn't understand. Just describe it like you are talking to a blind person:
2d videogame scene showing a man and a woman.
The man is in the foreground to the left. He is a muscular man in sporty underwear, showing v sign.
The woman is in the background to the right. She is a mature woman kusanagi motoko, she wears denim shorts and white parka.
When I tried A with B, I got lots of C.
(A is your original prompt)
(B is the same seed and other parameters)
(C is concept bleed)
2d videogame scene showing a man and a woman.
The man is in the foreground to the left. He is a muscular man in sporty underwear, showing v sign.
The woman is in the background to the right. She is a mature woman kusanagi motoko, she wears denim shorts and white parka.
This seems to work best, thank you. I use flan_t5_xxl_TE-only_FP16 with Chroma, it gives a bit better results sometimes.
I hope we get official prompting guides once v50 is out.
I prefer to believe that this is a flaw in t5, and it can be easily done using natural language on models that use the gemma of Instruction Tuning as a te. Examples include sana, etc.
These are prompt commands that the T5 model is designed to process.
Note: Chroma model can NOT process these commands as it only uses the T5 encoder.
But the T5 encoder should respond positively to such input , I reckon.
You can verify this using Google Colab: https://huggingface.co/datasets/codeShare/lora-training-data/blob/main/T5_encoding_test.ipynb
π₯ T5 Model Commands π₯
Translation
translate English to French: Hello, world!
βBonjour le monde!
Summarization
summarize: Solar energy is sustainable but costly...
βSolar energy is sustainable, cost-effective, but expensive to install.
Question Answering
question: Whatβs Franceβs capital? context: France is in Europe.
βParis
Sentiment Analysis
sentiment: This movie is awesome!
βpositive
Text Classification
classify: Free vacation, click here!
βspam
Paraphrasing
paraphrase: The weather is nice today.
βItβs pleasant today.
Text Generation
generate: Once upon a time...
βOnce upon a time, a hero emerged...
Natural Language Inference
nli premise: The dog runs. hypothesis: The dog is active.
βentailment
Coreference Resolution
coreference: John went to the store. He bought apples.
βJohn and he refer to the same person.
Word Sense Disambiguation
wsd: bank The bank by the river flooded.
βriverbank
Text Completion
complete: The capital of France is <mask>.
βParis
Grammatical Error Correction
correct: He go to school everyday.
βHe goes to school every day.
//----//