2xQwen2.5-Coder-3B-Cyclops-Main

This model is a fine-tuned version of 2 Qwen/Qwen2.5-Coder-3B using Multi-LLM Group Relative Policy Optimization (MLGRPO) on OpenAI HumanEval dataset.

"Cyclops" relies heavily on its aux() function for core implementation, while the main function adds edge case handling and refinements — just like a cyclops wielding power through its single eye.

Model Details

Base Model: Qwen/Qwen2.5-Coder-3B
Training Method: MLGRPO (Multi-LLM Group Relative Policy Optimization)
Dataset: HumanEval
Task: Code generation with auxiliary function collaboration

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("LovelyBuggies/2xQwen2.5-Coder-3B-Cyclops-Main")
model = AutoModelForCausalLM.from_pretrained("LovelyBuggies/2xQwen2.5-Coder-3B-Cyclops-Main")

# Generate code
inputs = tokenizer(main_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
main_completion = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
cleaned_main_completion = extract_specific_function(cleanup_code(main_completion), example['entry_point'])
print(cleaned_main_completion)

Training Details

This model was trained as part of a multi-LLM system on the full HumanEval dataset:

Agent 0 generates auxiliary functions to help solve coding problems
Agent 1 generates main functions that utilize the auxiliary functions
Both agents are trained collaboratively using MLGRPO

Agent Role

This is the Main Function Generator agent that creates the primary solution functions.
It will identify edge cases first and then call auxiliary functions generated by 2xQwen2.5-Coder-3B-Cyclops-Aux to write code more effectively.

Citation

If you use this model, please cite:

Coming soon.