Stefan Schweter's picture

In a Training Loop 🔄

Stefan Schweter PRO

stefan-it

·

https://schweter.bayern

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models, German Language Models, Bavarian NLP 🥨

Recent Activity

new activity about 8 hours ago

stefan-it/Groundsource:Help Addding New Metadata!

published a dataset about 8 hours ago

stefan-it/Groundsource

updated a dataset about 9 hours ago

stefan-it/Groundsource

View all activity

Organizations

upvoted 2 articles about 11 hours ago

Article

Ulysses Sequence Parallelism: Training with Million-Token Contexts

5 days ago

•

17

Article

FlashHead: Accelerating Language Model Inference ~ Efficient drop-in replacement for the classification head

2 days ago

•

1

upvoted a paper 1 day ago

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

Paper • 2603.09229 • Published 4 days ago • 62

upvoted a collection 1 day ago

Nemotron-Pre-Training-Datasets

Large scale pre-training datasets used in the Nemotron family of models. • 12 items • Updated 2 days ago • 117

upvoted a paper 1 day ago

Lost in Backpropagation: The LM Head is a Gradient Bottleneck

Paper • 2603.10145 • Published 3 days ago • 5

upvoted a collection 2 days ago

NVIDIA Nemotron v3

Open, Production-ready Enterprise Models • 12 items • Updated 2 days ago • 194

upvoted a collection 4 days ago

MixtureVitae study models and datasets

Collection of models and dataset related to MixtureVitae, open and fully reproducible pretraining dataset built from permissive sources • 16 items • Updated 29 days ago • 1

upvoted an article 7 days ago

Article

Scaling Pedagogical Pre-training: From Optimal Mixing to 10 Billion Tokens

8 days ago

•

4

upvoted a collection 11 days ago

🤏 Smol-Data

Tried and tested mixes for strong pretraining. Inspired by https://huggingface.co/blog/codelion/optimal-dataset-mixing • 14 items • Updated 11 days ago • 12

upvoted a paper 12 days ago

Recovered in Translation: Efficient Pipeline for Automated Translation of Benchmarks and Datasets

Paper • 2602.22207 • Published 16 days ago • 42

upvoted a paper 17 days ago

NanoKnow: How to Know What Your Language Model Knows

Paper • 2602.20122 • Published 18 days ago • 7

upvoted an article 17 days ago

Article

Do Bubbles Form When Tens of Thousands of AIs Simulate Capitalism?

18 days ago

•

17

upvoted a paper 18 days ago

The Million-Label NER: Breaking Scale Barriers with GLiNER bi-encoder

Paper • 2602.18487 • Published about 1 month ago • 5

upvoted a collection 23 days ago

Avey B1 experimental

Experimental pre-trained checkpoints for Avey-B1 • 3 items • Updated 19 days ago • 3

upvoted 2 papers 24 days ago

jina-embeddings-v5-text: Task-Targeted Embedding Distillation

Paper • 2602.15547 • Published 25 days ago • 26

Avey-B

Paper • 2602.15814 • Published 24 days ago • 3

upvoted a collection 24 days ago

Aya Datasets

The Aya Collection is a massive multilingual collection for over 100 languages consisting of 513 million instances of prompts and completions. • 5 items • Updated Jul 31, 2025 • 27

upvoted 3 papers 30 days ago

LoRA-Squeeze: Simple and Effective Post-Tuning and In-Tuning Compression of LoRA Modules

Paper • 2602.10993 • Published about 1 month ago • 1

Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning

Paper • 2602.11149 • Published about 1 month ago • 15

SteuerLLM: Local specialized large language model for German tax law analysis

Paper • 2602.11081 • Published about 1 month ago • 1