eoe's picture

eoe

eoe

·

AI & ML interests

None yet

Recent Activity

reacted to anakin87's post with ❤️ 1 day ago

How LLM training with RL Environments works? It all starts with 𝗥𝗲𝗶𝗻𝗳𝗼𝗿𝗰𝗲𝗺𝗲𝗻𝘁 𝗟𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗩𝗲𝗿𝗶𝗳𝗶𝗮𝗯𝗹𝗲 𝗥𝗲𝘄𝗮𝗿𝗱𝘀 - question asked - model generates reasoning + answer - answer checked against ground truth - reward drives RL training In this setup, the environment is simple: fixed questions and answers, rollout logic, reward(s) Consider a more complex tic-tac-toe env ❌⭕ It adds: - dynamic game generation/handling - tunable opponent skill - multi-turn interactions (envs can also include tools) --- What happens at training? We use 𝗚𝗿𝗼𝘂𝗽 𝗥𝗲𝗹𝗮𝘁𝗶𝘃𝗲 𝗣𝗼𝗹𝗶𝗰𝘆 𝗢𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻 with a tic-tac-toe env No critic model needed, the group is the baseline Simpler than PPO 1️⃣ Rollout generation: from the same board, model plays N games via sampling 2️⃣ Each game scored with deterministic rewards (win, format, ...) 3️⃣ Mean score computed across the group 4️⃣ Each rollout's advantage = its score minus the group mean 5️⃣ Model updated to favor trajectories above baseline 🔁 Repeat For a deep dive, check out 🌱 https://github.com/anakin87/llm-rl-environments-lil-course a free hands-on course on RL environments for LLMs

liked a model 8 days ago

DunnBC22/vit-base-patch16-224-in21k_Human_Activity_Recognition

commentedon a paper 3 months ago

Shallow-π: Knowledge Distillation for Flow-based VLAs

View all activity

Organizations

None yet

liked a model 8 days ago

DunnBC22/vit-base-patch16-224-in21k_Human_Activity_Recognition

Image Classification • Updated 18 days ago • 93 • 9

liked 2 Spaces 3 months ago

2025 AI Timeline

2025 China AI Timeline

Explore 2025 Chinese AI model releases in a timeline

liked a model 4 months ago

zai-org/AutoGLM-Phone-9B

Image-Text-to-Text • 934k • Updated Jan 7 • 2.58k • 436

liked 2 models 8 months ago

NexaAI/OmniNeural-4B

Any-to-Any • Updated Nov 7, 2025 • 80 • 166

Falconsai/intent_classification

Text Classification • 67M • Updated Dec 9, 2023 • 371 • 58

liked a model about 1 year ago

deepseek-ai/deepseek-moe-16b-base

Text Generation • 16B • Updated Jan 12, 2024 • 29k • 145

liked 3 models over 1 year ago

squeeze-ai-lab/TinyAgent-7B

Text Generation • 7B • Updated May 30, 2024 • 4 • 5

OS-Copilot/OS-Genesis-4B-AC

Image-Text-to-Text • 4B • Updated Jan 8, 2025 • 12 • 7

zai-org/glm-edge-1.5b-chat

Text Generation • 2B • Updated Nov 28, 2024 • 1.24k • 18

liked 4 models almost 2 years ago

stabilityai/sdxl-vae

Updated Aug 4, 2023 • 304k • 738

qualcomm/Llama-v2-7B-Chat

Text Generation • Updated 6 days ago • 25

apple/OpenELM

Updated May 2, 2024 • 1.45k

NexaAI/Octopus-v2

Text Generation • 3B • Updated May 21, 2024 • 513 • 891

liked 2 models about 2 years ago

nota-ai/coreml-bk-sdm

Text-to-Image • Updated Nov 17, 2023 • 7

nota-ai/bk-sdm-tiny-2m

Text-to-Image • Updated Nov 17, 2023 • 311 • 19