Image-Text-to-Text
PEFT
Safetensors
conversational
Configuration Parsing Warning: In adapter_config.json: "peft.task_type" must be a string

Model Card for Model ID

This is a first VLLM model able to switch PEFT adapters between two transcription styles for Wertern ancient manuscripts:

  • ABBreviated style: Keeping the original abbreviations from the manuscripts using MUFI characters.

  • NOT_ABBreviated style : Developping the abbreviations and symbols used in the manuscript to produce a normalized text.

Model Description

  • Developed by: [Sergio Torres Aguilar]
  • Model type: [Multimodal]
  • Language(s) (NLP): [Latin, French, Spanish, German]
  • License: [MIT]

Uses

The model use two light PEFT adapter added to the MiniCPM-Llama3-V-2_5 (2024)

How to Get Started with the Model

The following code is intended to produce both transcription styles based on a folder containing graphical manuscripts lines:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
from PIL import Image
import os
from tqdm import tqdm
import json

# Configuration
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model_name = "openbmb/MiniCPM-Llama3-V-2_5"
abbr_adapters = "magistermilitum/Tridis_HTR_MiniCPM_ABBR"
not_abbr_adapters = "magistermilitum/Tridis_HTR_MiniCPM"

image_folder = "/your/images/folder/path"

class TranscriptionModel:
    """Handles model loading, adapter switching, and transcription generation."""
    def __init__(self, model_name, abbr_adapters, not_abbr_adapters, device):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
        self.base_model = AutoModelForCausalLM.from_pretrained(
            model_name, trust_remote_code=True, attn_implementation='sdpa', torch_dtype=torch.bfloat16, token=True
        )
        self.base_model = PeftModel.from_pretrained(self.base_model, abbr_adapters, adapter_name="ABBR")
        self.base_model.load_adapter(not_abbr_adapters, adapter_name="NOT_ABBR")
        self.base_model.set_adapter("ABBR")  # Set default adapter
        self.base_model.to(device).eval()

    def generate(self, adapter, image):
        """Generate transcription for the given adapter and image."""
        if hasattr(self.base_model, "past_key_values"):
            self.base_model.past_key_values = None
        self.base_model.set_adapter(adapter)
        msgs = [{"role": "user", "content": [f"Transcribe this manuscript line in mode <{adapter}>:", image]}]
        with torch.no_grad():
            res = self.base_model.chat(image=image, msgs=msgs, tokenizer=self.tokenizer, max_new_tokens=128)
        # Remove <ABBR> and <NOT_ABBR> tokens from the output
        res = res.replace(f"<{adapter}>", "").replace(f"</{adapter}>", "")
        return res


class TranscriptionPipeline:
    """Handles image processing, transcription, and result saving."""
    def __init__(self, model, image_folder):
        self.model = model
        self.image_folder = image_folder

    def run_inference(self):
        """Process all images in the folder and generate transcriptions."""
        results = []
        for image_file in tqdm([f for f in os.listdir(self.image_folder)[:20] if f.endswith(('.png', '.jpg', '.jpeg'))]):
            image = Image.open(os.path.join(self.image_folder, image_file)).convert("RGB")
            print(f"\nProcessing image: {image_file}")
            
            # Generate transcriptions for both adapters
            transcriptions = {
                adapter: self.model.generate(adapter, image)
                for adapter in ["ABBR", "NOT_ABBR"]
            }
            for adapter, res in transcriptions.items():
                print(f"Mode ({adapter}): {res}")
            results.append({"image": image_file, "transcriptions": transcriptions})

            #image.show() #Optional

        # Save results to a JSON file
        with open("transcriptions_results.json", "w", encoding="utf-8") as f:
            json.dump(results, f, ensure_ascii=False, indent=4)


# Initialize and run the pipeline
model = TranscriptionModel(model_name, abbr_adapters, not_abbr_adapters, device)
TranscriptionPipeline(model, image_folder).run_inference()

Citation

@misc{torres_aguilar:hal-04983305,
    title={Dual-Style Transcription of Historical Manuscripts based on Multimodal Small Language Models with Switchable Adapters}, 
    author={Torres Aguilar, Sergio},
    url={https://hal.science/hal-04983305},
    year={2025},
    note = {working paper or preprint}
}
  • PEFT 0.14.1.dev0
Downloads last month
28
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for magistermilitum/Tridis_HTR_MiniCPM_ABBR

Adapter
(7)
this model

Datasets used to train magistermilitum/Tridis_HTR_MiniCPM_ABBR