Model Card for Model ID

Inputs an image (of a sweater or something that you wanna knit), and provides instructions on how to do the knitting/crochery.

Model Details

Model Description

This model takes an image input — typically of a sweater, garment, or knitted/crocheted item — and generates step-by-step knitting or crochet instructions to help recreate the item. It is designed to assist knitting enthusiasts, hobbyists, and designers by translating visual patterns into written guidance, making it easier to reproduce complex or visually striking designs. The model leverages the google/paligemma2-3b-pt-224 architecture, fine-tuned on the arkrajkundu/knitting_patterns dataset, and integrates with the PEFT (Parameter-Efficient Fine-Tuning) library to deliver efficient and accessible knitting pattern generation.

Developed by: [arkrajkundu]
Model type: [Vision-to-Text Model (Image-to-Instructions Translation)]
Language(s) (NLP): [English (en)]
Finetuned from model [optional]: [google/paligemma2-3b-pt-224]

Model Sources [optional]

Repository: [https://huggingface.co/arkrajkundu/knitGPT_PG_adapter/]

How to Get Started with the Model

To get started with the KnitGPT model, follow these steps to perform inference:

Install Required Libraries Before using the model, make sure to install the necessary libraries. You can do so by running: [pip install torch transformers peft Pillow]

Inference Code

import json
import torch
from PIL import Image
from transformers import PaliGemmaProcessor, PaliGemmaForConditionalGeneration
from peft import PeftModel

base_model_id = "google/paligemma2-3b-pt-224"
adapter_repo_id = "arkrajkundu/knitGPT_v1.0"
image_resize = (128, 128)
max_new_tokens = 256

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

processor = PaliGemmaProcessor.from_pretrained(base_model_id)
model = PaliGemmaForConditionalGeneration.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(model, adapter_repo_id)
model.to(device)
model.eval()

image_path = "/"

try:
    image = Image.open(image_path).convert("RGB").resize(image_resize)
except Exception as e:
    print(f"Failed to load image {image_path}: {e}")
    exit()

prompt = "How to knit this pattern?"

inputs = processor(
    text=prompt,
    images=image,
    return_tensors="pt"
)
for k, v in inputs.items():
    inputs[k] = v.to(device)

with torch.no_grad():
    generated_ids = model.generate(**inputs, max_new_tokens=max_new_tokens)

output = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print("\nInference complete!")
print(f"Model Output: {output}")

Training Details

Training Data

Training Data The training data for the KnitGPT model comes from the arkrajkundu/knitting_patterns dataset. This dataset includes a variety of knitting patterns that have been used to fine-tune the model, enabling it to understand visual patterns and generate corresponding textual knitting instructions.

Dataset Source: [https://huggingface.co/datasets/arkrajkundu/knitting_patterns]

Preprocessing [optional]

[The data was preprocessed by converting knitting images and their corresponding instructions into a format suitable for the model. Each input image was resized and normalized to facilitate consistent model performance.]

arkrajkundu
/

knitGPT_v1.0