|
--- |
|
base_model: openbmb/MiniCPM-Llama3-V-2_5 |
|
library_name: peft |
|
license: mit |
|
datasets: |
|
- magistermilitum/Tridis |
|
- CATMuS/medieval |
|
language: |
|
- la |
|
- fr |
|
- es |
|
- de |
|
pipeline_tag: image-text-to-text |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
This is a first VLLM model able to switch PEFT adapters between two transcription styles for Western ancient manuscripts: |
|
|
|
- **ABBreviated style**: Keeping the original abbreviations from the manuscripts using MUFI characters. |
|
|
|
- **NOT_ABBreviated style** : Developping the abbreviations and symbols used in the manuscript to produce a normalized text. |
|
|
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
|
|
- **Developed by:** [Sergio Torres Aguilar] |
|
- **Model type:** [Multimodal] |
|
- **Language(s) (NLP):** [Latin, French, Spanish, German] |
|
- **License:** [MIT] |
|
|
|
|
|
## Uses |
|
|
|
The model use two light PEFT adapter added to the MiniCPM-Llama3-V-2_5 (2024) |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
The following code is intended to produce both transcription styles based on a folder containing graphical manuscripts lines: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from peft import PeftModel |
|
import torch |
|
from PIL import Image |
|
import os |
|
from tqdm import tqdm |
|
import json |
|
|
|
# Configuration |
|
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") |
|
model_name = "openbmb/MiniCPM-Llama3-V-2_5" |
|
abbr_adapters = "magistermilitum/Tridis_HTR_MiniCPM_ABBR" |
|
not_abbr_adapters = "magistermilitum/Tridis_HTR_MiniCPM" |
|
|
|
image_folder = "/your/images/folder/path" |
|
|
|
class TranscriptionModel: |
|
"""Handles model loading, adapter switching, and transcription generation.""" |
|
def __init__(self, model_name, abbr_adapters, not_abbr_adapters, device): |
|
self.tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) |
|
self.base_model = AutoModelForCausalLM.from_pretrained( |
|
model_name, trust_remote_code=True, attn_implementation='sdpa', torch_dtype=torch.bfloat16, token=True |
|
) |
|
self.base_model = PeftModel.from_pretrained(self.base_model, abbr_adapters, adapter_name="ABBR") |
|
self.base_model.load_adapter(not_abbr_adapters, adapter_name="NOT_ABBR") |
|
self.base_model.set_adapter("ABBR") # Set default adapter |
|
self.base_model.to(device).eval() |
|
|
|
def generate(self, adapter, image): |
|
"""Generate transcription for the given adapter and image.""" |
|
if hasattr(self.base_model, "past_key_values"): |
|
self.base_model.past_key_values = None |
|
self.base_model.set_adapter(adapter) |
|
msgs = [{"role": "user", "content": [f"Transcribe this manuscript line in mode <{adapter}>:", image]}] |
|
with torch.no_grad(): |
|
res = self.base_model.chat(image=image, msgs=msgs, tokenizer=self.tokenizer, max_new_tokens=128) |
|
# Remove <ABBR> and <NOT_ABBR> tokens from the output |
|
res = res.replace(f"<{adapter}>", "").replace(f"</{adapter}>", "") |
|
return res |
|
|
|
|
|
class TranscriptionPipeline: |
|
"""Handles image processing, transcription, and result saving.""" |
|
def __init__(self, model, image_folder): |
|
self.model = model |
|
self.image_folder = image_folder |
|
|
|
def run_inference(self): |
|
"""Process all images in the folder and generate transcriptions.""" |
|
results = [] |
|
for image_file in tqdm([f for f in os.listdir(self.image_folder)[:20] if f.endswith(('.png', '.jpg', '.jpeg'))]): |
|
image = Image.open(os.path.join(self.image_folder, image_file)).convert("RGB") |
|
print(f"\nProcessing image: {image_file}") |
|
|
|
# Generate transcriptions for both adapters |
|
transcriptions = { |
|
adapter: self.model.generate(adapter, image) |
|
for adapter in ["ABBR", "NOT_ABBR"] |
|
} |
|
for adapter, res in transcriptions.items(): |
|
print(f"Mode ({adapter}): {res}") |
|
results.append({"image": image_file, "transcriptions": transcriptions}) |
|
|
|
#image.show() #Optional |
|
|
|
# Save results to a JSON file |
|
with open("transcriptions_results.json", "w", encoding="utf-8") as f: |
|
json.dump(results, f, ensure_ascii=False, indent=4) |
|
|
|
|
|
# Initialize and run the pipeline |
|
model = TranscriptionModel(model_name, abbr_adapters, not_abbr_adapters, device) |
|
TranscriptionPipeline(model, image_folder).run_inference() |
|
``` |
|
|
|
|
|
|
|
## Citation |
|
|
|
```bibtex |
|
@misc{torres_aguilar:hal-04983305, |
|
title={Dual-Style Transcription of Historical Manuscripts based on Multimodal Small Language Models with Switchable Adapters}, |
|
author={Torres Aguilar, Sergio}, |
|
url={https://hal.science/hal-04983305}, |
|
year={2025}, |
|
note = {working paper or preprint} |
|
} |
|
``` |
|
|
|
|
|
- PEFT 0.14.1.dev0 |