arianaazarbal/Qwen3-8B-train-away-lying-lr1e5-temp10-penalize Reinforcement Learning • Updated 29 days ago
arianaazarbal/Qwen3-8B-train-away-lying-lr1e4-temp10-penalize Reinforcement Learning • Updated about 1 month ago
arianaazarbal/Qwen3-8B-train-away-lying-lr1e5-temp10-hindsight Reinforcement Learning • Updated 30 days ago
arianaazarbal/Qwen3-8B-train-away-lying-lr1e4-temp10-hindsight Reinforcement Learning • Updated 30 days ago
arianaazarbal/Qwen3-8B-train-away-lying-lr1e5-temp10-penalize-neutral-neutral Reinforcement Learning • Updated 29 days ago
stewy33/Qwen3-8B-0524_original_augmented_original_cat_comment_and_cake-ae9eb5b5 Updated 28 days ago • 3
stewy33/Qwen3-8B-0524_original_augmented_original_cat_abortion_and_fda-df061258 Updated 28 days ago • 2