Wei Xiong's picture

Wei Xiong

weqweasdas

·

https://weixiongust.github.io/WeiXiongUST/index.html

AI & ML interests

Machine learning, RLHF

Recent Activity

upvoted a paper 20 days ago

Stream-R1: Reliability-Perplexity Aware Reward Distillation for Streaming Video Generation

upvoted a paper 7 months ago

Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning

updated a dataset 7 months ago

weqweasdas/qwen15b_train_simple_subset5k_for_difficulty_transition

View all activity

Organizations

liked a model over 1 year ago

RLHFlow/Llama3.1-8B-PRM-Deepseek-Data

Text Generation • 8B • Updated May 10, 2025 • 4.54k • • 39

liked a dataset over 1 year ago

RLHFlow/RLHFlow-SFT-Dataset-ver2

Viewer • Updated Nov 2, 2024 • 2.32M • 83 • 5

liked a model over 1 year ago

RLHFlow/Llama3.1-8B-PRM-Mistral-Data

Text Generation • 8B • Updated Nov 9, 2024 • 276 • • 10

liked 3 models almost 2 years ago

NCSOFT/Llama-3-OffsetBias-RM-8B

Text Classification • 8B • Updated Sep 6, 2024 • 159 • 25

RLHFlow/LLaMA3-SFT

Text Generation • 8B • Updated Nov 3, 2024 • 54 • • 10

RLHFlow/LLaMA3-iterative-DPO-final

Text Generation • 8B • Updated Oct 14, 2024 • 44 • • 41

liked 8 models about 2 years ago

RLHFlow/ArmoRM-Llama3-8B-v0.1

Text Classification • 8B • Updated Sep 23, 2024 • 16.2k • 184

RLHFlow/pair-preference-model-LLaMA3-8B

Text Generation • 8B • Updated Oct 14, 2024 • 52 • • 38

Salesforce/LLaMA-3-8B-SFR-RM-R

Text Classification • 8B • Updated Jan 21, 2025 • 23 • 11

Salesforce/LLaMA-3-8B-SFR-SFT-R

Text Generation • 8B • Updated Jan 21, 2025 • 44 • • 8

Salesforce/LLaMA-3-8B-SFR-Iterative-DPO-R

Text Generation • 8B • Updated Jan 21, 2025 • 76 • • 78

sfairXC/FsfairX-LLaMA3-RM-v0.1

Text Classification • 8B • Updated Oct 14, 2024 • 2.54k • 60

sfairXC/FsfairX-Zephyr-Chat-v0.1

Text Generation • 7B • Updated Apr 24, 2024 • 12 • • 8

weqweasdas/RM-Mistral-7B

Text Classification • 7B • Updated Mar 31, 2024 • 2.79k • 25

liked a Space about 2 years ago

Reward Bench Leaderboard

Explore and compare model scores on RewardBench benchmarks

liked a model about 2 years ago

weqweasdas/RM-Gemma-7B

Text Classification • 9B • Updated Mar 22, 2024 • 17.4k • 8

liked a model over 2 years ago

weqweasdas/RM-Gemma-2B

Text Classification • 3B • Updated Mar 22, 2024 • 17.6k • 25

liked a model almost 3 years ago

weqweasdas/hh_rlhf_rm_open_llama_3b

Text Classification • Updated Feb 25, 2024 • 36 • 17

liked a Space about 3 years ago

Robin 7b