Casktalk-VLM ( CaskTalk Vision Language Model)
Model Details
- Developed by: ToriLab (CasTalk)
- Model type: (based on LLaVA, + mistral-7b)
Usage
Presequities
pip install --upgrade pip
pip install transformers>=4.39.0
Inference
from transformers import LlavaNextProcessor, LlavaNextForConditionalGeneration
import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
processor = LlavaNextProcessor.from_pretrained("torilab/casktalk-vlm-v1.0")
model = LlavaNextForConditionalGeneration.from_pretrained(
"torilab/casktalk-vlm-v1.0",
torch_dtype=torch.float16,
low_cpu_mem_usage=True
)
model.to(device)
We now pass the image and the text prompt to the processor, and then pass the processed inputs to the generate.
from PIL import Image
import requests
url = "<your_user_image>"
image = Image.open(requests.get(url, stream=True).raw)
prompt = "[INST] <image>\nWhat is shown in this image? [/INST]"
inputs = processor(prompt, image, return_tensors="pt").to(device)
output = model.generate(**inputs, max_new_tokens=100)
Call decode to decode the output tokens.
print(processor.decode(output[0], skip_special_tokens=True))
About -ToriLab
ToriLab builds reliable, practical, and scalable AI solutions for the CasTalk app.
- phuongdv-VN ([email protected])
- khanhvu-VN ([email protected])
- hieptran-VN ([email protected])
- tanaka-JP ([email protected])
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support