Victor Gallego's picture

Victor Gallego

vicgalle

·

https://github.com/vicgalle

AI & ML interests

Preference fine-tuning, alignment & synthetic data. Building LLMs in general!

Recent Activity

liked a dataset 1 day ago

HuggingFaceFW/finepdfs

updated a dataset 4 days ago

vicgalle/rubric-feedback-bench

liked a model 4 days ago

kudzueye/boreal-qwen-image

View all activity

Organizations

upvoted a paper 12 days ago

DINOv3

Paper • 2508.10104 • Published 26 days ago • 241

upvoted a paper 13 days ago

Hermes 4 Technical Report

Paper • 2508.18255 • Published 14 days ago • 35

upvoted a paper 28 days ago

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published Aug 8 • 175

upvoted 5 papers about 1 month ago

Provably Learning from Language Feedback

Paper • 2506.10341 • Published Jun 12 • 9

Multi-Agent Game Generation and Evaluation via Audio-Visual Recordings

Paper • 2508.00632 • Published Aug 1 • 3

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 148

The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

Paper • 2507.18553 • Published Jul 24 • 39

Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement

Paper • 2507.18742 • Published Jul 24 • 5

upvoted an article about 2 months ago

Article

Automated Discovery of High-Performance GPU Kernels with OpenEvolve

By

•

Jun 27

• 22

upvoted a paper 2 months ago

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8 • 42

upvoted a paper 3 months ago

Robust Reward Modeling via Causal Rubrics

Paper • 2506.16507 • Published Jun 19 • 9

upvoted a collection 3 months ago

Configurable Preference Tuning ⚙️📝

CPT uses rubric-guided synthetic data and DPO to enable LLMs to dynamically adjust behavior (e.g., writing style) at inference with system prompts • 7 items • Updated Jun 17 • 1

upvoted 2 papers 3 months ago

Configurable Preference Tuning with Rubric-Guided Synthetic Data

Paper • 2506.11702 • Published Jun 13 • 2

Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit

Paper • 2506.06607 • Published Jun 7 • 2

upvoted a collection 3 months ago

Synthetic Data Generation

SDG papers • 86 items • Updated Jul 11 • 15

upvoted a collection 4 months ago

Atropos Artifacts

A collection of experimental artifacts created with Atropos, Nous' RL Environments framework - https://github.com/NousResearch/Atropos • 9 items • Updated about 5 hours ago • 10

upvoted a paper 4 months ago

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published Apr 29 • 97

upvoted 2 papers 5 months ago

Perception Encoder: The best visual embeddings are not at the output of the network

Paper • 2504.13181 • Published Apr 17 • 35

ReZero: Enhancing LLM search ability by trying one-more-time

Paper • 2504.11001 • Published Apr 15 • 15

upvoted a collection 5 months ago

Nemotron-H

Mamba-Transformer hybrid models • 10 items • Updated 6 days ago • 30