Alex Renoki
rennokki
AI & ML interests
Real Life & Have Fun (RLHF)
Recent Activity
liked
a model
1 day ago
RecurvAI/Recurv-Medical-Deepseek-R1
liked
a dataset
13 days ago
FreedomIntelligence/medical-o1-reasoning-SFT
Organizations
rennokki's activity
reacted to
odellus's
post with 🧠
1 day ago
upvoted
a
paper
18 days ago
upvoted
a
collection
about 2 months ago
reacted to
asoria's
post with 👍🔥
3 months ago
Post
1880
🚀 Exploring Topic Modeling with BERTopic 🤖
When you come across an interesting dataset, you often wonder:
Which topics frequently appear in these documents? 🤔
What is this data really about? 📊
Topic modeling helps answer these questions by identifying recurring themes within a collection of documents. This process enables quick and efficient exploratory data analysis.
I’ve been working on an app that leverages BERTopic, a flexible framework designed for topic modeling. Its modularity makes BERTopic powerful, allowing you to switch components with your preferred algorithms. It also supports handling large datasets efficiently by merging models using the BERTopic.merge_models approach. 🔗
🔍 How do we make this work?
Here’s the stack we’re using:
📂 Data Source ➡️ Hugging Face datasets with DuckDB for retrieval
🧠 Text Embeddings ➡️ Sentence Transformers (all-MiniLM-L6-v2)
⚡ Dimensionality Reduction ➡️ RAPIDS cuML UMAP for GPU-accelerated performance
🔍 Clustering ➡️ RAPIDS cuML HDBSCAN for fast clustering
✂️ Tokenization ➡️ CountVectorizer
🔧 Representation Tuning ➡️ KeyBERTInspired + Hugging Face Inference Client with Meta-Llama-3-8B-Instruct
🌍 Visualization ➡️ Datamapplot library
Check out the space and see how you can quickly generate topics from your dataset: datasets-topics/topics-generator
Powered by @MaartenGr - BERTopic
When you come across an interesting dataset, you often wonder:
Which topics frequently appear in these documents? 🤔
What is this data really about? 📊
Topic modeling helps answer these questions by identifying recurring themes within a collection of documents. This process enables quick and efficient exploratory data analysis.
I’ve been working on an app that leverages BERTopic, a flexible framework designed for topic modeling. Its modularity makes BERTopic powerful, allowing you to switch components with your preferred algorithms. It also supports handling large datasets efficiently by merging models using the BERTopic.merge_models approach. 🔗
🔍 How do we make this work?
Here’s the stack we’re using:
📂 Data Source ➡️ Hugging Face datasets with DuckDB for retrieval
🧠 Text Embeddings ➡️ Sentence Transformers (all-MiniLM-L6-v2)
⚡ Dimensionality Reduction ➡️ RAPIDS cuML UMAP for GPU-accelerated performance
🔍 Clustering ➡️ RAPIDS cuML HDBSCAN for fast clustering
✂️ Tokenization ➡️ CountVectorizer
🔧 Representation Tuning ➡️ KeyBERTInspired + Hugging Face Inference Client with Meta-Llama-3-8B-Instruct
🌍 Visualization ➡️ Datamapplot library
Check out the space and see how you can quickly generate topics from your dataset: datasets-topics/topics-generator
Powered by @MaartenGr - BERTopic