Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
1
2
8
CoffeeBliss
CoffeeBliss
Follow
bird-of-paradise's profile picture
1 follower
·
1 following
AI & ML interests
None yet
Recent Activity
replied
to
lewtun
's
post
17 days ago
This paper (https://huggingface.co/papers/2412.18925) has a really interesting recipe for inducing o1-like behaviour in Llama models: * Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting. * Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases) * Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1 * Use the resulting data for SFT & RL * Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement. Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!
reacted
to
lewtun
's
post
with 🔥
17 days ago
This paper (https://huggingface.co/papers/2412.18925) has a really interesting recipe for inducing o1-like behaviour in Llama models: * Iteratively sample CoTs from the model, using a mix of different search strategies. This gives you something like Stream of Search via prompting. * Verify correctness of each CoT using GPT-4o (needed because exact match doesn't work well in medicine where there are lots of aliases) * Use GPT-4o to reformat the concatenated CoTs into a single stream that includes smooth transitions like "hmm, wait" etc that one sees in o1 * Use the resulting data for SFT & RL * Use sparse rewards from GPT-4o to guide RL training. They find RL gives an average ~3 point boost across medical benchmarks and SFT on this data already gives a strong improvement. Applying this strategy to other domains could be quite promising, provided the training data can be formulated with verifiable problems!
liked
a model
18 days ago
bartowski/HuatuoGPT-o1-8B-GGUF
View all activity
Organizations
None yet
CoffeeBliss
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a model
18 days ago
bartowski/HuatuoGPT-o1-8B-GGUF
Text Generation
•
Updated
18 days ago
•
2.74k
•
5
liked
a model
19 days ago
FreedomIntelligence/HuatuoGPT-o1-8B
Text Generation
•
Updated
20 days ago
•
1.57k
•
27
liked
a dataset
23 days ago
yulan-team/YuLan-Mini-Datasets
Updated
20 days ago
•
472
•
8
liked
a model
23 days ago
yulan-team/YuLan-Mini
Text Generation
•
Updated
15 days ago
•
859
•
33
liked
2 models
4 months ago
meta-llama/Llama-3.2-11B-Vision-Instruct
Image-Text-to-Text
•
Updated
Dec 4, 2024
•
2.39M
•
•
1.24k
meta-llama/Llama-3.2-1B-Instruct
Text Generation
•
Updated
Oct 24, 2024
•
1.03M
•
•
706
liked
2 models
5 months ago
openbmb/MiniCPM-V-2_6
Image-Text-to-Text
•
Updated
3 days ago
•
65.3k
•
911
openbmb/MiniCPM-V-2_6-gguf
Updated
Aug 13, 2024
•
6.44k
•
150