Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR Paper • 2509.18174 • Published Sep 17 • 124
Misraj Open Data Collection This collection contain an open source data has been collected and processed by Misraj team • 3 items • Updated Jul 7 • 6
Mutarjim: Advancing Bidirectional Arabic-English Translation with a Small Language Model Paper • 2505.17894 • Published May 23 • 219
KITAB-Bench Collection A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding • 24 items • Updated Feb 24 • 16
LlamaV-o1 Collection Rethinking Step-by-step Visual Reasoning in LLMs • 5 items • Updated Feb 2 • 1
VLM-R1 Collection Multimodal Reasoning Dataset for Large Scale Training with DeepSeek-R1 thoughts style • 18 items • Updated Apr 14 • 2
Sadeed: Advancing Arabic Diacritization Through Small Language Model Paper • 2504.21635 • Published Apr 30 • 58
Controllable Text Generation for Large Language Models: A Survey Paper • 2408.12599 • Published Aug 22, 2024 • 65