Update README.md

2ffc141 verified 23 days ago

4.76 kB

	---
	base_model: openbmb/MiniCPM-Llama3-V-2_5
	library_name: peft
	license: mit
	datasets:
	- magistermilitum/Tridis
	- CATMuS/medieval
	language:
	- la
	- fr
	- es
	- de
	pipeline_tag: image-text-to-text
	---

	# Model Card for Model ID

	This is a first VLLM model able to switch PEFT adapters between two transcription styles for Western ancient manuscripts:

	- ABBreviated style: Keeping the original abbreviations from the manuscripts using MUFI characters.

	- NOT_ABBreviated style : Developping the abbreviations and symbols used in the manuscript to produce a normalized text.


	### Model Description

	<!-- Provide a longer summary of what this model is. -->



	- Developed by: [Sergio Torres Aguilar]
	- Model type: [Multimodal]
	- Language(s) (NLP): [Latin, French, Spanish, German]
	- License: [MIT]


	## Uses

	The model use two light PEFT adapter added to the MiniCPM-Llama3-V-2_5 (2024)


	## How to Get Started with the Model

	The following code is intended to produce both transcription styles based on a folder containing graphical manuscripts lines:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	import torch
	from PIL import Image
	import os
	from tqdm import tqdm
	import json

	# Configuration
	device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
	model_name = "openbmb/MiniCPM-Llama3-V-2_5"
	abbr_adapters = "magistermilitum/Tridis_HTR_MiniCPM_ABBR"
	not_abbr_adapters = "magistermilitum/Tridis_HTR_MiniCPM"

	image_folder = "/your/images/folder/path"

	class TranscriptionModel:
	"""Handles model loading, adapter switching, and transcription generation."""
	def __init__(self, model_name, abbr_adapters, not_abbr_adapters, device):
	self.tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
	self.base_model = AutoModelForCausalLM.from_pretrained(
	model_name, trust_remote_code=True, attn_implementation='sdpa', torch_dtype=torch.bfloat16, token=True
	)
	self.base_model = PeftModel.from_pretrained(self.base_model, abbr_adapters, adapter_name="ABBR")
	self.base_model.load_adapter(not_abbr_adapters, adapter_name="NOT_ABBR")
	self.base_model.set_adapter("ABBR") # Set default adapter
	self.base_model.to(device).eval()

	def generate(self, adapter, image):
	"""Generate transcription for the given adapter and image."""
	if hasattr(self.base_model, "past_key_values"):
	self.base_model.past_key_values = None
	self.base_model.set_adapter(adapter)
	msgs = [{"role": "user", "content": [f"Transcribe this manuscript line in mode <{adapter}>:", image]}]
	with torch.no_grad():
	res = self.base_model.chat(image=image, msgs=msgs, tokenizer=self.tokenizer, max_new_tokens=128)
	# Remove <ABBR> and <NOT_ABBR> tokens from the output
	res = res.replace(f"<{adapter}>", "").replace(f"</{adapter}>", "")
	return res


	class TranscriptionPipeline:
	"""Handles image processing, transcription, and result saving."""
	def __init__(self, model, image_folder):
	self.model = model
	self.image_folder = image_folder

	def run_inference(self):
	"""Process all images in the folder and generate transcriptions."""
	results = []
	for image_file in tqdm([f for f in os.listdir(self.image_folder)[:20] if f.endswith(('.png', '.jpg', '.jpeg'))]):
	image = Image.open(os.path.join(self.image_folder, image_file)).convert("RGB")
	print(f"\nProcessing image: {image_file}")

	# Generate transcriptions for both adapters
	transcriptions = {
	adapter: self.model.generate(adapter, image)
	for adapter in ["ABBR", "NOT_ABBR"]
	}
	for adapter, res in transcriptions.items():
	print(f"Mode ({adapter}): {res}")
	results.append({"image": image_file, "transcriptions": transcriptions})

	#image.show() #Optional

	# Save results to a JSON file
	with open("transcriptions_results.json", "w", encoding="utf-8") as f:
	json.dump(results, f, ensure_ascii=False, indent=4)


	# Initialize and run the pipeline
	model = TranscriptionModel(model_name, abbr_adapters, not_abbr_adapters, device)
	TranscriptionPipeline(model, image_folder).run_inference()
	```



	## Citation

	```bibtex
	@misc{torres_aguilar:hal-04983305,
	title={Dual-Style Transcription of Historical Manuscripts based on Multimodal Small Language Models with Switchable Adapters},
	author={Torres Aguilar, Sergio},
	url={https://hal.science/hal-04983305},
	year={2025},
	note = {working paper or preprint}
	}
	```


	- PEFT 0.14.1.dev0