Stefan Schweter's picture

In a Training Loop 🔄

Stefan Schweter PRO

stefan-it

·

https://schweter.bayern

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models, German Language Models, Bavarian NLP 🥨, xLSTM

Recent Activity

liked a model about 8 hours ago

flwrlabs/Lizzy-7B

upvoted a collection 1 day ago

liked a model 8 days ago

NX-AI/xlstm_scaling_laws

View all activity

Organizations

commented 2 papers about 1 month ago

Effective Distillation to Hybrid xLSTM Architectures

Paper • 2603.15590 • Published Mar 16 • 33 •

TildeOpen LLM: Leveraging Curriculum Learning to Achieve Equitable Language Representation

Paper • 2603.08182 • Published Mar 9 •

New activity in stefan-it/Groundsource about 1 month ago

Help Addding New Metadata!

#1 opened about 1 month ago by

commented 2 papers about 2 months ago

Avey-B

Paper • 2602.15814 • Published Feb 17 • 3 •

Avey-B

Paper • 2602.15814 • Published Feb 17 • 3 •

commented a paper 2 months ago

Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning

Paper • 2602.11149 • Published Feb 11 • 17 •

commented a paper 3 months ago

FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale

Paper • 2601.22146 • Published Jan 29 • 10 •

commented a paper 4 months ago

Bolmo: Byteifying the Next Generation of Language Models

Paper • 2512.15586 • Published Dec 17, 2025 • 17 •

commented a paper 5 months ago

Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining

Paper • 2511.21613 • Published Nov 26, 2025 • 2 •

New activity in pdelobelle/baguettotron-nl 5 months ago

Fine-Tuning example

#1 opened 5 months ago by

New activity in enguard/multi-lingual-prompt-moderation 5 months ago

Data Quality

#2 opened 5 months ago by

commented 2 papers 6 months ago

Gaperon: A Peppered English-French Generative Language Model Suite

Paper • 2510.25771 • Published Oct 29, 2025 • 16 •

Mask and You Shall Receive: Optimizing Masked Language Modeling For Pretraining BabyLMs

Paper • 2510.20475 • Published Oct 23, 2025 • 1 •

New activity in nanochat-students/README 6 months ago

🥦 Pretraining Thread

#2 opened 6 months ago by

New activity in karpathy/nanochat-d32 6 months ago

transformers-implementation

#1 opened 6 months ago by

commented a paper 6 months ago

The Art of Asking: Multilingual Prompt Optimization for Synthetic Data

Paper • 2510.19806 • Published Oct 22, 2025 • 1 •

New activity in HuggingFaceFW/finewiki 6 months ago

docs: fix typo

#2 opened 6 months ago by

commented 2 papers 6 months ago

Empirical Evaluation of Knowledge Distillation from Transformers to Subquadratic Language Models

Paper • 2504.14366 • Published Apr 19, 2025 • 1 •

The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models

Paper • 2510.13996 • Published Oct 15, 2025 • 9 •

New activity in bavarian-nlp/baivaria-v1 7 months ago

Adding `safetensors` variant of this model

#1 opened 7 months ago by