Afonso Marques

marquesafonso

AI & ML interests

None yet

Recent Activity

Organizations

None yet

marquesafonso's activity

upvoted an article 2 days ago
view article
Article

Introducing EuroBERT: A High-Performance Multilingual Encoder Model

By EuroBERT and 3 others β€’
β€’ 112
reacted to davidberenstein1957's post with πŸ‘ 8 days ago
view post
Post
2969
🫸 New release to push vector search to the Hub with vicinity and work with any serialisable objects.

πŸ§‘β€πŸ« KNN, HNSW, USEARCH, ANNOY, PYNNDESCENT, FAISS, and VOYAGER.

πŸ”— Example Repo: minishlab/my-vicinity-repo
reacted to singhsidhukuldeep's post with πŸ‘ 8 days ago
view post
Post
6731
Exciting New Tool for Knowledge Graph Extraction from Plain Text!

I just came across a groundbreaking new tool called KGGen that's solving a major challenge in the AI world - the scarcity of high-quality knowledge graph data.

KGGen is an open-source Python package that leverages language models to extract knowledge graphs (KGs) from plain text. What makes it special is its innovative approach to clustering related entities, which significantly reduces sparsity in the extracted KGs.

The technical approach is fascinating:

1. KGGen uses a multi-stage process involving an LLM (GPT-4o in their implementation) to extract entities and relations from source text
2. It aggregates graphs across sources to reduce redundancy
3. Most importantly, it applies iterative LM-based clustering to refine the raw graph

The clustering stage is particularly innovative - it identifies which nodes and edges refer to the same underlying entities or concepts. This normalizes variations in tense, plurality, stemming, and capitalization (e.g., "labors" clustered with "labor").

The researchers from Stanford and University of Toronto also introduced MINE (Measure of Information in Nodes and Edges), the first benchmark for evaluating KG extractors. When tested against existing methods like OpenIE and GraphRAG, KGGen outperformed them by up to 18%.

For anyone working with knowledge graphs, RAG systems, or KG embeddings, this tool addresses the fundamental challenge of data scarcity that's been holding back progress in graph-based foundation models.

The package is available via pip install kg-gen, making it accessible to everyone. This could be a game-changer for knowledge graph applications!
upvoted an article 15 days ago
view article
Article

Refreshing zero-shot classification with ModernBERT

By Ihor β€’
β€’ 1
upvoted 3 articles about 1 month ago
view article
Article

Agentic RAG Stack (1/5) - Index and retrieve documents for vector search using Sentence Transformers and DuckDB

β€’ 18
view article
Article

Agentic RAG Stack (2/5) - Augment retrieval results by reranking using Sentence Transformers

β€’ 9
upvoted an article about 1 month ago
view article
Article

Replicating DeepSeek R1 for Information Extraction

By Ihor β€’
β€’ 38
reacted to crodri's post with πŸ‘€ about 1 month ago
view post
Post
1471
At the Language Technologies Unit of the Barcelona Supercomputing Center, we are developing State of the Art Large Language and Voice Models through various national and international projects. It is an exciting time to be working in generative AI!
We are looking for bright and motivated individuals to help us achieve ambitious goals. Our latest opening for the Innovation group that develops powerful and socially useful applications for AI technology might be for you. Check it out here:
https://www.bsc.es/join-us/job-opportunities/3025lsltre2