Chiron Bang

chironbang

AI & ML interests

None yet

Recent Activity

commented on an article about 2 months ago

Illustrating Reinforcement Learning from Human Feedback (RLHF)

updated a model about 1 year ago

chironbang/segformer-b0-finetuned-segments-sidewalk-2

updated a model almost 2 years ago

chironbang/whisper-large-v2-english-100steps

View all activity

Organizations

None yet

chironbang's activity

commented on Illustrating Reinforcement Learning from Human Feedback (RLHF) about 2 months ago

Is the last figure caption correct?
"The above diagram makes it look like both models generate different responses for the same prompt, but what really happens is that the RL policy generates text, and that text is fed into the initial model to produce its relative probabilities for the KL penalty."

To me it makes more sense to feed the prompt to the reference and the policy models then to compare the reference model probability distribution with the policy's.
Any insight would be appreciated.

Best,
Chiron

updated a model about 1 year ago

chironbang/segformer-b0-finetuned-segments-sidewalk-2

Updated Mar 28, 2024

updated a model almost 2 years ago

chironbang/whisper-large-v2-english-100steps

Updated May 21, 2023

updated a model about 2 years ago

chironbang/dsn_afrispeech

Automatic Speech Recognition • Updated Apr 30, 2023