Post
830
Ever dreamt of ingesting into a vector DB that pile of CSVs, Word documents and presentations laying in some remote folders on your PC?๐๏ธ
What if I told you that you can do it within three to six lines of code?๐คฏ
Well, with my latest open-source project, ๐ข๐ง๐ ๐๐ฌ๐ญ-๐๐ง๐ฒ๐ญ๐ก๐ข๐ง๐ (https://github.com/AstraBert/ingest-anything), you can take all your non-PDF files, convert them to PDF, extract their text, chunk, embed and load them into a vector database, all in one go!๐
How? It's pretty simple!
๐ The input files are converted into PDF by PdfItDown (https://github.com/AstraBert/PdfItDown)
๐ The PDF text is extracted using LlamaIndex readers
๐ฆ The text is chunked exploiting Chonkie
๐งฎ The chunks are embedded thanks to Sentence Transformers models
๐๏ธ The embeddings are loaded into a Qdrant vector database
And you're done!โ
Curious of trying it? Install it by running:
๐ฑ๐ช๐ฑ ๐ช๐ฏ๐ด๐ต๐ข๐ญ๐ญ ๐ช๐ฏ๐จ๐ฆ๐ด๐ต-๐ข๐ฏ๐บ๐ต๐ฉ๐ช๐ฏ๐จ
And you can start using it in your python scripts!๐
Don't forget to star it on GitHub and let me know if you have any feedback! โก๏ธ https://github.com/AstraBert/ingest-anything
What if I told you that you can do it within three to six lines of code?๐คฏ
Well, with my latest open-source project, ๐ข๐ง๐ ๐๐ฌ๐ญ-๐๐ง๐ฒ๐ญ๐ก๐ข๐ง๐ (https://github.com/AstraBert/ingest-anything), you can take all your non-PDF files, convert them to PDF, extract their text, chunk, embed and load them into a vector database, all in one go!๐
How? It's pretty simple!
๐ The input files are converted into PDF by PdfItDown (https://github.com/AstraBert/PdfItDown)
๐ The PDF text is extracted using LlamaIndex readers
๐ฆ The text is chunked exploiting Chonkie
๐งฎ The chunks are embedded thanks to Sentence Transformers models
๐๏ธ The embeddings are loaded into a Qdrant vector database
And you're done!โ
Curious of trying it? Install it by running:
๐ฑ๐ช๐ฑ ๐ช๐ฏ๐ด๐ต๐ข๐ญ๐ญ ๐ช๐ฏ๐จ๐ฆ๐ด๐ต-๐ข๐ฏ๐บ๐ต๐ฉ๐ช๐ฏ๐จ
And you can start using it in your python scripts!๐
Don't forget to star it on GitHub and let me know if you have any feedback! โก๏ธ https://github.com/AstraBert/ingest-anything