What advantage does this have over normal algorithmic ways of turning HTML to Markdown ?

#5
by MohamedRashad - opened

I don't understand why would i use this instead of going directly to a simple tool that will convert my HTML to Markdown. What advantages will i see here ?

Jina AI org

I hope this post will answer your question https://jina.ai/news/readerlm-v2-frontier-small-language-model-for-html-to-markdown-and-json

TL;DR: the structure of HTML is reserved well, and excelling at generating complex elements like code fences, nested lists, tables and LaTex equations.

I think it's a great model to use in the future. I understand that for now the algorithmic way of extracting html wins but I think they are demonstrating the capabilities of what an LLMs could do without the algorithm.

I liked the model, do you plan to extract the dataset from html to markdown and json?

Thank you very much.

Sign up or log in to comment