Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,51 @@
|
|
1 |
-
---
|
2 |
-
license:
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cdla-permissive-2.0
|
3 |
+
datasets:
|
4 |
+
- ds4sd/SynthFormulaNet
|
5 |
+
- ds4sd/SynthCodeNet
|
6 |
+
tags:
|
7 |
+
- ocr
|
8 |
+
- code
|
9 |
+
- math
|
10 |
+
- formula
|
11 |
+
---
|
12 |
+
|
13 |
+
# Code Formula Model
|
14 |
+
|
15 |
+
The **Code Formula Model** processes an image of a code snippet or formula at 120 DPI and outputs its content.
|
16 |
+
|
17 |
+
- **Code Snippets**:
|
18 |
+
The model identifies the programming language and outputs the code repsecting the indendation shown in the given image. The output format will be:<br>
|
19 |
+
"<\_\<programming language\>\_> \<content of the image\>"<br>
|
20 |
+
Example:<br>
|
21 |
+
"<_Java_> System.out.println("Hello World.");"
|
22 |
+
|
23 |
+
- **Formulas**:
|
24 |
+
The model generates the corresponding LaTeX code.
|
25 |
+
|
26 |
+
|
27 |
+
This model was trained using the following two datasets:
|
28 |
+
1. https://huggingface.co/datasets/ds4sd/SynthFormulaNet
|
29 |
+
2. https://huggingface.co/datasets/ds4sd/SynthCodeNet
|
30 |
+
|
31 |
+
# References
|
32 |
+
```bibtex
|
33 |
+
@techreport{Docling,
|
34 |
+
author = {Deep Search Team},
|
35 |
+
month = {8},
|
36 |
+
title = {{Docling Technical Report}},
|
37 |
+
url={https://arxiv.org/abs/2408.09869},
|
38 |
+
eprint={2408.09869},
|
39 |
+
doi = "10.48550/arXiv.2408.09869",
|
40 |
+
version = {1.0.0},
|
41 |
+
year = {2024}
|
42 |
+
}
|
43 |
+
|
44 |
+
@article{nassar2025smoldocling,
|
45 |
+
title={SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion},
|
46 |
+
author={Nassar, Ahmed and Marafioti, Andres and Omenetti, Matteo and Lysak, Maksym and Livathinos, Nikolaos and Auer, Christoph and Morin, Lucas and de Lima, Rafael Teixeira and Kim, Yusik and Gurbuz, A Said and others},
|
47 |
+
journal={arXiv preprint arXiv:2503.11576},
|
48 |
+
year={2025}
|
49 |
+
}
|
50 |
+
|
51 |
+
```
|