aaa

qwertyuiopasdfg

AI & ML interests

None yet

Recent Activity

updated a model 4 days ago

qwertyuiopasdfg/model002_16bit

published a model 4 days ago

qwertyuiopasdfg/model002_16bit

updated a dataset 6 days ago

qwertyuiopasdfg/temp0001

View all activity

Organizations

None yet

updated a model 4 days ago

qwertyuiopasdfg/model002_16bit

2B • Updated 4 days ago • 2

published a model 4 days ago

qwertyuiopasdfg/model002_16bit

2B • Updated 4 days ago • 2

updated a dataset 6 days ago

qwertyuiopasdfg/temp0001

Viewer • Updated 6 days ago • 11.4k

published a dataset 6 days ago

qwertyuiopasdfg/temp0001

Viewer • Updated 6 days ago • 11.4k

updated a model 6 days ago

qwertyuiopasdfg/model001

2B • Updated 6 days ago • 7

published a model 6 days ago

qwertyuiopasdfg/model001

2B • Updated 6 days ago • 7

liked a model 11 days ago

Skywork/Skywork-SWE-32B

Text Generation • 33B • Updated 6 days ago • 1.13k • • 67

liked a model 14 days ago

openbmb/BitCPM4-1B

Text Generation • 1B • Updated 23 days ago • 472 • 17

liked a model 17 days ago

microsoft/phi-4

Text Generation • 15B • Updated Feb 24 • 793k • • 2.1k

liked a model 18 days ago

stelterlab/DeepSeek-R1-0528-Qwen3-8B-AWQ

Text Generation • 2B • Updated 29 days ago • 707 • 1

liked 2 models 21 days ago

openbmb/MiniCPM4-0.5B

Text Generation • 0.4B • Updated 23 days ago • 9.33k • 52

Qwen/Qwen3-Reranker-4B

Text Ranking • 4B • Updated 24 days ago • 57.4k • 72

liked a Space 21 days ago

102

NSFW-3B

👑

NSFW-3B: A Dark, Unrestricted AI Model

liked a model 23 days ago

rednote-hilab/dots.llm1.base

Text Generation • 143B • Updated 7 days ago • 1.28k • 54

liked a model 24 days ago

NSFW-API/NSFW_Wan_1.3b

Updated 8 days ago • 316

liked a model 25 days ago

nvidia/Nemotron-Research-Reasoning-Qwen-1.5B

Text Generation • 2B • Updated 28 days ago • 14.5k • • 170

reacted to Ruurd's post with 👀 25 days ago

Post

2264

The past year I have been trying to get diffusion models to work for language generation, without having to retrain a LLM from scratch. And recently, we finally succeeded:

We introduce "LAD: LoRA-Adapted Denoiser", a method to convert a LLaMA model into a text diffusion model using LoRA finetuning and structured input corruption.

🎯 Try the demo and read the write-up here!
https://ruurdkuiper.github.io/tini-lad/

Unlike autoregressive (word-for-word) models like ChatGPT, diffusion models iteratively refine a noised sequence. However, most current diffusion approaches rely on all-parameter retraining and repeatedly remasking tokens, which is costly and slow during both training and inference!

🧠 With LAD:
- We can finetune an autoregressive model for diffusive generation in just 10 hours on a single GPU.
- Test-time compute is fully adjustable: fewer steps means faster outputs while more steps improve output quality.
- Due to our unique noising schedule, remasking is not always needed during inference. All tokens are attended to in each iteration!

🔍 LAD is built using:
– A frozen LLaMA-8B backbone
– Structured noising: token swaps, duplications, replacements, span shifts
– Modified attention masks for bidirectional decoding

💡 We show that even small, fast-trained models can perform diffusive generation — with competitive benchmark performance, perplexity and more flexible test-time behavior than traditional transformers.