File size: 3,678 Bytes
db1e3f2 d4934a2 db1e3f2 33a73a5 d4934a2 db1e3f2 33a73a5 db1e3f2 33a73a5 a07b7d5 db1e3f2 a07b7d5 db1e3f2 a07b7d5 db1e3f2 a07b7d5 33a73a5 db1e3f2 a07b7d5 33a73a5 db1e3f2 a07b7d5 db1e3f2 a07b7d5 33a73a5 a07b7d5 33a73a5 a07b7d5 db1e3f2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
---
# Auto-generated fields, verify and update as needed
license: apache-2.0
tags:
- generated-by-script
- peft # Assume PEFT adapter unless explicitly a full model repo
- image-captioning # Add more specific task tags if applicable
base_model: [] # <-- FIXED: Provide empty list as default to satisfy validator
# - nlpconnect/vit-gpt2-image-captioning # Heuristic guess for processor, VERIFY MANUALLY
# - nlpconnect/vit-gpt2-image-captioning # Heuristic guess for decoder, VERIFY MANUALLY
---
# Model: ashimdahal/nlpconnect-vit-gpt2-image-captioning_nlpconnect-vit-gpt2-image-captioning
This repository contains model artifacts for a run named `nlpconnect-vit-gpt2-image-captioning_nlpconnect-vit-gpt2-image-captioning`, likely a PEFT adapter.
## Training Source
This model was trained as part of the project/codebase available at:
https://github.com/ashimdahal/captioning_image/blob/main
## Base Model Information (Heuristic)
* **Processor/Vision Encoder (Guessed):** `nlpconnect/vit-gpt2-image-captioning`
* **Decoder/Language Model (Guessed):** `nlpconnect/vit-gpt2-image-captioning`
**⚠️ Important:** The `base_model` tag in the metadata above is initially empty. The models listed here are *heuristic guesses* based on the training directory name (`nlpconnect-vit-gpt2-image-captioning_nlpconnect-vit-gpt2-image-captioning`). Please verify these against your training configuration and update the `base_model:` list in the YAML metadata block at the top of this README with the correct Hugging Face model identifiers.
## How to Use (Example with PEFT)
```python
from transformers import AutoProcessor, AutoModelForVision2Seq, Blip2ForConditionalGeneration # Or other relevant classes
from peft import PeftModel, PeftConfig
import torch
# --- Configuration ---
# 1. Specify the EXACT base model identifiers used during training
base_processor_id = "nlpconnect/vit-gpt2-image-captioning" # <-- Replace with correct HF ID
base_model_id = "nlpconnect/vit-gpt2-image-captioning" # <-- Replace with correct HF ID (e.g., Salesforce/blip2-opt-2.7b)
# 2. Specify the PEFT adapter repository ID (this repo)
adapter_repo_id = "ashimdahal/nlpconnect-vit-gpt2-image-captioning_nlpconnect-vit-gpt2-image-captioning"
# --- Load Base Model and Processor ---
processor = AutoProcessor.from_pretrained(base_processor_id)
# Load the base model (ensure it matches the type used for training)
# Example for BLIP-2 OPT:
base_model = Blip2ForConditionalGeneration.from_pretrained(
base_model_id,
torch_dtype=torch.float16 # Or torch.bfloat16 or float32, match training/inference needs
)
# Or for other model types:
base_model = AutoModelForVision2Seq.from_pretrained(base_model_id, torch_dtype=torch.float16)
base_model = AutoModelForCausalLM
......
# --- Load PEFT Adapter ---
# Load the adapter config and merge the adapter weights into the base model
model = PeftModel.from_pretrained(base_model, adapter_repo_id)
model = model.merge_and_unload() # Merge weights for inference (optional but often recommended)
model.eval() # Set model to evaluation mode
# --- Inference Example ---
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
image = ... # Load your image (e.g., using PIL)
text = "a photo of" # Optional prompt start
inputs = processor(images=image, text=text, return_tensors="pt").to(device, torch.float16) # Match model dtype
generated_ids = model.generate(**inputs, max_new_tokens=50)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0].strip()
print(f"Generated Caption: {{generated_text}}")
```
*More model-specific documentation, evaluation results, and usage examples should be added here.*
|