DanbotNL 2408 260M

DanbotNL is translator that tranaslates from natural languages into Danbooru tags. DanbotNL supports Japanese, English and Danbooru tags.

Parameter size: 260M
Knowledge cutoff: 2024/08/31

See our tech blog (Japanese) for more details.

Model Details

Model Description

Developed by: Plat
Languages: Japanese and English
License: Apache-2.0
Finetuned from model [optional]: sbintuitions/modernbert-ja-130m and Dart v3 SFT
Demo: p1atdev/danbooru-tags-translator-preview

Usage

ComfyUI

See https://github.com/p1atdev/danbot-comfy-node.

Transformers

pip install transformers sentencepiece protobuf

import torch
from transformers import AutoModelForPreTraining, AutoProcessor

REPO = "dartags/DanbotNL-2408-260m"

processor = AutoProcessor.from_pretrained(
    REPO,
    trust_remote_code=True,
    revision="827103c", # optional 
)
model = AutoModelForPreTraining.from_pretrained(
    REPO,
    trust_remote_code=True,
    revision="827103c", # optional
    torch_dtype=torch.bfloat16
)

# Translate
inputs = processor(
    encoder_text="一人の猫耳の少女が座ってこっちを見ている。",
    decoder_text=processor.decoder_tokenizer.apply_chat_template(
        {
            "aspect_ratio": "tall",
            "rating": "general",
            "length": "very_short",
            "translate_mode": "exact",
        },
        tokenize=False,
    ),
    return_tensors="pt",
)

with torch.inference_mode():
    outputs = model.generate(
        **inputs.to(model.device),
        do_sample=False,
        eos_token_id=processor.decoder_tokenizer.convert_tokens_to_ids(
            "</translation>"
        ),
    )
translation = ", ".join(
    tag
    for tag in processor.batch_decode(
        outputs[0, len(inputs.input_ids[0]) :],
        skip_special_tokens=True,
    )
    if tag.strip() != ""
)
print("translation:", translation)
# translation: 1girl, solo, looking at viewer, sitting, cat girl

# Extend
inputs = processor(
    encoder_text="一人の猫耳の少女が座ってこっちを見ている。",
    decoder_text=processor.decoder_tokenizer.apply_chat_template(
        {
            "aspect_ratio": "tall",
            "rating": "general",
            "length": "long",
            "translate_mode": "approx",
            "copyright": "",
            "character": "",
            "translation": translation,
        },
        tokenize=False,
    ),
    return_tensors="pt",
)
with torch.inference_mode():
    outputs = model.generate(
        **inputs.to(model.device),
        do_sample=False,
        eos_token_id=processor.decoder_tokenizer.convert_tokens_to_ids("</extension>"),
    )
extension = ", ".join(
    tag
    for tag in processor.batch_decode(
        outputs[0, len(inputs.input_ids[0]) :],
        skip_special_tokens=True,
    )
    if tag.strip() != ""
)
print("extension:", extension)
# extension: simple background, white background, shirt, skirt, long sleeves, animal ears, closed mouth, ribbon, jacket, pantyhose, open clothes, blue eyes, brown hair, long hair, shoes, white shirt, full body, cat ears, black skirt, loafers

Prompt template

ASPECT_RATIO = Literal[
    "too_tall",
    "tall_wallpaper",
    "tall",
    "square",
    "wide",
    "wide_wallpaper",
    "too_wide",
]
RATING = Literal[
    "general",
    "sensitive",
    "questionable",
    "explicit",
]
LENGTH = Literal[
    "very_short",
    "short",
    "long",
    "very_long",
]

aspect_ratio: ASPECT_RATIO
rating: RATING
length: LENGTH

TRANSLATE_MODE = Literal[
  "exact",
  "approx",
]
translate_mode: TRANSLATE_MODE

copyright <- empty string or comma-separated valid danbooru tags (copyright)
character <- empty string or comma-separated valid danbooru tags (character)
translation <- empty string or comma-separated valid danbooru tags (general and meta)

prompt = (
  "<|bos|>"
  f"{aspect_ratio}{rating}{length}"
  "<text><|text|></text>"
  f"<|translate:{translate_mode}|><|input_end|>"
  # ↑ up to here if translate_mode == "exact"
  
  f"<copyright>{copyright}</copyright>"
  f"<character>{character}</character>"
  "<general>"
  f"<translation>{translation}</translation>"
  f"<extension>"
  # ↑ up to here if translate_mode == "approx"

  # {extension}</extension></general><|eos|> # the model generates this
)

Training dataset

Synthetic natural language dataset: dartags/danbooru-2408-blind-captions
- This dataset was generated by Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4 and nejumi/phi-4-GPTQ-Int4-calib-ja-1k using danbooru tags and tag wiki data without any images.
- Danbooru tags data source: isek-ai/danbooru-tags-2024
- Danbooru tag wiki data source: isek-ai/danbooru-wiki-2024

License

Apache-2.0

dartags
/

DanbotNL-2408-260M

DanbotNL 2408 260M

Model Details

Model Description

Usage

ComfyUI

Transformers

Prompt template

Training dataset

License

Model tree for dartags/DanbotNL-2408-260M

Datasets used to train dartags/DanbotNL-2408-260M

Collection including dartags/DanbotNL-2408-260M

DanbotNL 2408