fancyfeast
/

llama-joycaption-beta-one-hf-llava

Image-Text-to-Text

Model card Files Files and versions Community

fancyfeast commited on May 12

Commit

3101114

·

verified ·

1 Parent(s): 537fbaf

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -9,10 +9,10 @@ tags:
 [Github](https://github.com/fpgaminer/joycaption)
-JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.
 Key Features:
-- **Free and Open**: It will be released for free, open weights, no restrictions, and just like [bigASP](https://www.reddit.com/r/StableDiffusion/comments/1dbasvx/the_gory_details_of_finetuning_sdxl_for_30m/), will come with training scripts and lots of juicy details on how it gets built.
 - **Uncensored**: Equal coverage of SFW and NSFW concepts. No "cylindrical shaped object with a white substance coming out on it" here.
 - **Diversity**: All are welcome here. Do you like digital art? Photoreal? Anime? Furry? JoyCaption is for everyone. Pains are being taken to ensure broad coverage of image styles, content, ethnicity, gender, orientation, etc.
 - **Minimal Filtering**: JoyCaption is trained on large swathes of images so that it can understand almost all aspects of our world. almost. Illegal content will never be tolerated in JoyCaption's training.
@@ -39,7 +39,7 @@ from transformers import AutoProcessor, LlavaForConditionalGeneration
 IMAGE_PATH = "image.jpg"
 PROMPT = "Write a long descriptive caption for this image in a formal tone."
-MODEL_NAME = "fancyfeast/llama-joycaption-alpha-two-hf-llava"
 # Load JoyCaption
@@ -79,7 +79,7 @@ with torch.no_grad():
 	# Generate the captions
 	generate_ids = llava_model.generate(
 		**inputs,
-		max_new_tokens=300,
 		do_sample=True,
 		suppress_tokens=None,
 		use_cache=True,

 [Github](https://github.com/fpgaminer/joycaption)
+JoyCaption is an image captioning Visual Language Model (VLM) built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.
 Key Features:
+- **Free and Open**: Always released for free, open weights, no restrictions, and just like [bigASP](https://www.reddit.com/r/StableDiffusion/comments/1dbasvx/the_gory_details_of_finetuning_sdxl_for_30m/), will come with training scripts and lots of juicy details on how it gets built.
 - **Uncensored**: Equal coverage of SFW and NSFW concepts. No "cylindrical shaped object with a white substance coming out on it" here.
 - **Diversity**: All are welcome here. Do you like digital art? Photoreal? Anime? Furry? JoyCaption is for everyone. Pains are being taken to ensure broad coverage of image styles, content, ethnicity, gender, orientation, etc.
 - **Minimal Filtering**: JoyCaption is trained on large swathes of images so that it can understand almost all aspects of our world. almost. Illegal content will never be tolerated in JoyCaption's training.
 IMAGE_PATH = "image.jpg"
 PROMPT = "Write a long descriptive caption for this image in a formal tone."
+MODEL_NAME = "fancyfeast/llama-joycaption-beta-one-hf-llava"
 # Load JoyCaption
 	# Generate the captions
 	generate_ids = llava_model.generate(
 		**inputs,
+		max_new_tokens=512,
 		do_sample=True,
 		suppress_tokens=None,
 		use_cache=True,