Safetensors
idefics3
ocr
code
math
formula
MatteoOmenetti commited on
Commit
15b5caf
·
verified ·
1 Parent(s): 0284bc9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -3
README.md CHANGED
@@ -1,3 +1,51 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cdla-permissive-2.0
3
+ datasets:
4
+ - ds4sd/SynthFormulaNet
5
+ - ds4sd/SynthCodeNet
6
+ tags:
7
+ - ocr
8
+ - code
9
+ - math
10
+ - formula
11
+ ---
12
+
13
+ # Code Formula Model
14
+
15
+ The **Code Formula Model** processes an image of a code snippet or formula at 120 DPI and outputs its content.
16
+
17
+ - **Code Snippets**:
18
+ The model identifies the programming language and outputs the code repsecting the indendation shown in the given image. The output format will be:<br>
19
+ "<\_\<programming language\>\_> \<content of the image\>"<br>
20
+ Example:<br>
21
+ "<_Java_> System.out.println("Hello World.");"
22
+
23
+ - **Formulas**:
24
+ The model generates the corresponding LaTeX code.
25
+
26
+
27
+ This model was trained using the following two datasets:
28
+ 1. https://huggingface.co/datasets/ds4sd/SynthFormulaNet
29
+ 2. https://huggingface.co/datasets/ds4sd/SynthCodeNet
30
+
31
+ # References
32
+ ```bibtex
33
+ @techreport{Docling,
34
+ author = {Deep Search Team},
35
+ month = {8},
36
+ title = {{Docling Technical Report}},
37
+ url={https://arxiv.org/abs/2408.09869},
38
+ eprint={2408.09869},
39
+ doi = "10.48550/arXiv.2408.09869",
40
+ version = {1.0.0},
41
+ year = {2024}
42
+ }
43
+
44
+ @article{nassar2025smoldocling,
45
+ title={SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion},
46
+ author={Nassar, Ahmed and Marafioti, Andres and Omenetti, Matteo and Lysak, Maksym and Livathinos, Nikolaos and Auer, Christoph and Morin, Lucas and de Lima, Rafael Teixeira and Kim, Yusik and Gurbuz, A Said and others},
47
+ journal={arXiv preprint arXiv:2503.11576},
48
+ year={2025}
49
+ }
50
+
51
+ ```