Gabriele Parisini

GabPar

GabPar5

AI & ML interests

None yet

Recent Activity

liked a model 2 months ago

DeepMount00/Sibilia-TTS

liked a model 4 months ago

DeepMount00/Ita-Search

liked a model 5 months ago

DeepMount00/Italian_NER_XXL_v2

View all activity

Organizations

None yet

liked a model 2 months ago

DeepMount00/Sibilia-TTS

Text-to-Speech • 2B • Updated Aug 23 • 32 • 10

liked a model 4 months ago

DeepMount00/Ita-Search

liked a model 5 months ago

DeepMount00/Italian_NER_XXL_v2

Token Classification • 0.1B • Updated Jun 11 • 1.56k • 27

reacted to as-cle-bert's post with 🔥 6 months ago

Post

2955

Ever dreamt of ingesting into a vector DB that pile of CSVs, Word documents and presentations laying in some remote folders on your PC?🗂️
What if I told you that you can do it within three to six lines of code?🤯
Well, with my latest open-source project, 𝐢𝐧𝐠𝐞𝐬𝐭-𝐚𝐧𝐲𝐭𝐡𝐢𝐧𝐠 (https://github.com/AstraBert/ingest-anything), you can take all your non-PDF files, convert them to PDF, extract their text, chunk, embed and load them into a vector database, all in one go!🚀
How? It's pretty simple!
📁 The input files are converted into PDF by PdfItDown (https://github.com/AstraBert/PdfItDown)
📑 The PDF text is extracted using LlamaIndex readers
🦛 The text is chunked exploiting Chonkie
🧮 The chunks are embedded thanks to Sentence Transformers models
🗄️ The embeddings are loaded into a Qdrant vector database

And you're done!✅
Curious of trying it? Install it by running:

𝘱𝘪𝘱 𝘪𝘯𝘴𝘵𝘢𝘭𝘭 𝘪𝘯𝘨𝘦𝘴𝘵-𝘢𝘯𝘺𝘵𝘩𝘪𝘯𝘨

And you can start using it in your python scripts!🐍
Don't forget to star it on GitHub and let me know if you have any feedback! ➡️ https://github.com/AstraBert/ingest-anything