File size: 3,678 Bytes

db1e3f2
d4934a2
db1e3f2
 
 
 
 
 
33a73a5
 
 
d4934a2
 
db1e3f2
 
 
 
 
 
 
 
 
 
 
 
 
 
33a73a5
db1e3f2
 
 
 
 
 
 
 
33a73a5
a07b7d5
db1e3f2
 
a07b7d5
db1e3f2
 
a07b7d5
db1e3f2
 
 
a07b7d5
 
 
33a73a5
db1e3f2
a07b7d5
33a73a5
 
db1e3f2
 
 
a07b7d5
 
 
db1e3f2
 
a07b7d5
 
33a73a5
a07b7d5
 
33a73a5
a07b7d5
 
 
 
 
db1e3f2


---
# Auto-generated fields, verify and update as needed
license: apache-2.0
tags:
- generated-by-script
- peft # Assume PEFT adapter unless explicitly a full model repo
- image-captioning # Add more specific task tags if applicable
base_model: [] # <-- FIXED: Provide empty list as default to satisfy validator
# - nlpconnect/vit-gpt2-image-captioning # Heuristic guess for processor, VERIFY MANUALLY
# - nlpconnect/vit-gpt2-image-captioning # Heuristic guess for decoder, VERIFY MANUALLY
---

# Model: ashimdahal/nlpconnect-vit-gpt2-image-captioning_nlpconnect-vit-gpt2-image-captioning

This repository contains model artifacts for a run named `nlpconnect-vit-gpt2-image-captioning_nlpconnect-vit-gpt2-image-captioning`, likely a PEFT adapter.

## Training Source
This model was trained as part of the project/codebase available at:
https://github.com/ashimdahal/captioning_image/blob/main

## Base Model Information (Heuristic)
* **Processor/Vision Encoder (Guessed):** `nlpconnect/vit-gpt2-image-captioning`
* **Decoder/Language Model (Guessed):** `nlpconnect/vit-gpt2-image-captioning`

**⚠️ Important:** The `base_model` tag in the metadata above is initially empty. The models listed here are *heuristic guesses* based on the training directory name (`nlpconnect-vit-gpt2-image-captioning_nlpconnect-vit-gpt2-image-captioning`). Please verify these against your training configuration and update the `base_model:` list in the YAML metadata block at the top of this README with the correct Hugging Face model identifiers.

## How to Use (Example with PEFT)

```python
from transformers import AutoProcessor, AutoModelForVision2Seq, Blip2ForConditionalGeneration # Or other relevant classes
from peft import PeftModel, PeftConfig
import torch

# --- Configuration ---
# 1. Specify the EXACT base model identifiers used during training
base_processor_id = "nlpconnect/vit-gpt2-image-captioning" # <-- Replace with correct HF ID
base_model_id = "nlpconnect/vit-gpt2-image-captioning" # <-- Replace with correct HF ID (e.g., Salesforce/blip2-opt-2.7b)

# 2. Specify the PEFT adapter repository ID (this repo)
adapter_repo_id = "ashimdahal/nlpconnect-vit-gpt2-image-captioning_nlpconnect-vit-gpt2-image-captioning"

# --- Load Base Model and Processor ---
processor = AutoProcessor.from_pretrained(base_processor_id)

# Load the base model (ensure it matches the type used for training)
# Example for BLIP-2 OPT:
base_model = Blip2ForConditionalGeneration.from_pretrained(
     base_model_id,
     torch_dtype=torch.float16 # Or torch.bfloat16 or float32, match training/inference needs
)
# Or for other model types:
base_model = AutoModelForVision2Seq.from_pretrained(base_model_id, torch_dtype=torch.float16)
base_model = AutoModelForCausalLM
......

# --- Load PEFT Adapter ---
# Load the adapter config and merge the adapter weights into the base model
model = PeftModel.from_pretrained(base_model, adapter_repo_id)
model = model.merge_and_unload() # Merge weights for inference (optional but often recommended)
model.eval() # Set model to evaluation mode

# --- Inference Example ---
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

image = ... # Load your image (e.g., using PIL)
text = "a photo of" # Optional prompt start

inputs = processor(images=image, text=text, return_tensors="pt").to(device, torch.float16) # Match model dtype

generated_ids = model.generate(**inputs, max_new_tokens=50)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
print(f"Generated Caption: {{generated_text}}")
```

*More model-specific documentation, evaluation results, and usage examples should be added here.*