lm-human-preference-details - a vwxyzjn Collection

vwxyzjn 's Collections

Async RLHF Paper Checkpoints

lm-human-preference-details

TL;DR summarization checkpoints

RLOO / PPOv2 TL;DR summarize checkpoints

lm-human-preference-details

updated Oct 4, 2023

vwxyzjn/train_policy_accelerate__sentiment_offline_5k.json__seed1__1696447674

Text Generation • 0.1B • Updated Oct 4, 2023 • 5
lm-human-preference-details/train_policy_accelerate__sentiment_offline_5k.json__seed1

Text Generation • 0.1B • Updated Oct 4, 2023 • 4