Max

ethduke

AI & ML interests

None yet

Recent Activity

updated a model about 7 hours ago

ethduke/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-pensive_armored_opossum

reacted to as-cle-bert's post with 🔥 4 days ago

Let's pipe some 𝗱𝗮𝘁𝗮 𝗳𝗿𝗼𝗺 𝘁𝗵𝗲 𝘄𝗲𝗯 into our vector database, shall we?🤠 With 𝐢𝐧𝐠𝐞𝐬𝐭-𝐚𝐧𝐲𝐭𝐡𝐢𝐧𝐠 𝐯𝟏.𝟑.𝟎 (https://github.com/AstraBert/ingest-anything) you can now scrape content simply starting from URLs, extract the text from it, chunk it and put it into your favorite LlamaIndex-compatible database!🕸️ You can do it thanks to 𝗰𝗿𝗮𝘄𝗹𝗲𝗲 by Apify, an open-source crawling library for python and javascript that handles all the data flow from the web: ingest-anything then combines it with 𝗕𝗲𝗮𝘂𝘁𝗶𝗳𝘂𝗹𝗦𝗼𝘂𝗽, 𝗣𝗱𝗳𝗜𝘁𝗗𝗼𝘄𝗻 and 𝗣𝘆𝗠𝘂𝗣𝗱𝗳 to scrape HTML files, convert them to PDF and extract the text - hassle-free!😸 Check the attached code snippet if you're curious of knowing how to get started🎬 PS: Don't tell anybody, but this release also has another gem... It supports OpenAI models for agentic chunking, following the new releases of Chonkie🦛✨ If you don't want to miss out on the new features, leave us a little star on GitHub ➡️ https://github.com/AstraBert/ingest-anything And join our discord community! ➡️ https://discord.gg/kDqHNjks

updated a model 4 days ago

ethduke/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-pensive_armored_opossum

View all activity

Organizations

None yet

ethduke's activity

updated a model about 7 hours ago

ethduke/Qwen2.5-0.5B-Instruct-Gensyn-Swarm-pensive_armored_opossum

Updated about 7 hours ago

reacted to as-cle-bert's post with 🔥 4 days ago

Post

3252

Let's pipe some 𝗱𝗮𝘁𝗮 𝗳𝗿𝗼𝗺 𝘁𝗵𝗲 𝘄𝗲𝗯 into our vector database, shall we?🤠

With 𝐢𝐧𝐠𝐞𝐬𝐭-𝐚𝐧𝐲𝐭𝐡𝐢𝐧𝐠 𝐯𝟏.𝟑.𝟎 (https://github.com/AstraBert/ingest-anything) you can now scrape content simply starting from URLs, extract the text from it, chunk it and put it into your favorite LlamaIndex-compatible database!🕸️

You can do it thanks to 𝗰𝗿𝗮𝘄𝗹𝗲𝗲 by Apify, an open-source crawling library for python and javascript that handles all the data flow from the web: ingest-anything then combines it with 𝗕𝗲𝗮𝘂𝘁𝗶𝗳𝘂𝗹𝗦𝗼𝘂𝗽, 𝗣𝗱𝗳𝗜𝘁𝗗𝗼𝘄𝗻 and 𝗣𝘆𝗠𝘂𝗣𝗱𝗳 to scrape HTML files, convert them to PDF and extract the text - hassle-free!😸

Check the attached code snippet if you're curious of knowing how to get started🎬

PS: Don't tell anybody, but this release also has another gem... It supports OpenAI models for agentic chunking, following the new releases of Chonkie🦛✨

If you don't want to miss out on the new features, leave us a little star on GitHub ➡️ https://github.com/AstraBert/ingest-anything
And join our discord community! ➡️ https://discord.gg/kDqHNjks