Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,7 @@
|
|
1 |
This is my reproduction of the Microsoft team's work, WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models. It is fully based on open-source models to construct training data and adopt supervised fine-tuning (SFT) to train the model. The results on code generation benchmarks like Humaneval (Humaneval+) and MBPP (MBPP+) are as follows: 79.9 (75.4), 75.8 (64.5). These results are excellent, confirming that the idea of 'learning from expert battles' proposed in the paper has great potential. I have also published the training data constructed during my reproduction of the paper in another repository, and everyone is welcome to use it.
|
2 |
Original paper link: https://arxiv.org/pdf/2412.17395
|
3 |
-
I have also published the training data constructed during my reproduction of the paper in another repository: https://huggingface.co/datasets/HuggingMicah/warrior_reproduce
|
|
|
4 |
Also, I reproduced the experimental results in the paper. There are some differences from the original results, and I have marked the distinctions with underlines.
|
5 |
| Models | Matplotlib (155) | NumPy (220) | Pandas (291) | PyTorch (68) | SciPy (106) | Sklearn (115) | TensorFlow (45) | Overall (1000) |
|
6 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
@@ -14,4 +15,12 @@ Also, I reproduced the experimental results in the paper. There are some differe
|
|
14 |
| Magicoder-CL (6.7B) | 54.6 | 34.8 | 19.0 | 24.7 | 25.0 | 22.6 | 28.9 | 29.9 |
|
15 |
| MagicoderS-CL (6.7B) | 55.9 | 40.6 | 28.4 | 40.4 | 28.8 | 35.8 | 37.6 | 37.5 |
|
16 |
| WarriorCoder_published_in_paper (6.7B) | 55.5 | 41.8 | 26.1 | 41.2 | 33.0 | 39.1 | 42.2 | 38.1 |
|
17 |
-
| WarriorCoder_my_reproduce (6.7B) | 56.1 | 45.0 | 32.0 | 38.2 | 36.8 | 44.3 | 48.9 | 41.7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
This is my reproduction of the Microsoft team's work, WarriorCoder: Learning from Expert Battles to Augment Code Large Language Models. It is fully based on open-source models to construct training data and adopt supervised fine-tuning (SFT) to train the model. The results on code generation benchmarks like Humaneval (Humaneval+) and MBPP (MBPP+) are as follows: 79.9 (75.4), 75.8 (64.5). These results are excellent, confirming that the idea of 'learning from expert battles' proposed in the paper has great potential. I have also published the training data constructed during my reproduction of the paper in another repository, and everyone is welcome to use it.
|
2 |
Original paper link: https://arxiv.org/pdf/2412.17395
|
3 |
+
I have also published the training data constructed during my reproduction of the paper in another repository: https://huggingface.co/datasets/HuggingMicah/warrior_reproduce .
|
4 |
+
|
5 |
Also, I reproduced the experimental results in the paper. There are some differences from the original results, and I have marked the distinctions with underlines.
|
6 |
| Models | Matplotlib (155) | NumPy (220) | Pandas (291) | PyTorch (68) | SciPy (106) | Sklearn (115) | TensorFlow (45) | Overall (1000) |
|
7 |
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
|
|
15 |
| Magicoder-CL (6.7B) | 54.6 | 34.8 | 19.0 | 24.7 | 25.0 | 22.6 | 28.9 | 29.9 |
|
16 |
| MagicoderS-CL (6.7B) | 55.9 | 40.6 | 28.4 | 40.4 | 28.8 | 35.8 | 37.6 | 37.5 |
|
17 |
| WarriorCoder_published_in_paper (6.7B) | 55.5 | 41.8 | 26.1 | 41.2 | 33.0 | 39.1 | 42.2 | 38.1 |
|
18 |
+
| WarriorCoder_my_reproduce (6.7B) | 56.1 | 45.0 | 32.0 | 38.2 | 36.8 | 44.3 | 48.9 | 41.7 |
|
19 |
+
|
20 |
+
| Models | HumanEval | HumanEval+ | MBPP | MBPP+ |
|
21 |
+
| --- | --- | --- | --- | --- |
|
22 |
+
| WizardCoder-CL (6.7B) | 48.7 | 40.5 | 56.4 | 47.0 |
|
23 |
+
| WizardCoder-SC (15B) | 51.4 | 45.3 | 61.6 | 50.7 |
|
24 |
+
| Magicoder-CL (6.7B) | 60.4 | 55.7 | 64.2 | 52.5 |
|
25 |
+
| MagicoderS-CL (6.7B) | 70.7 | 66.4 | 68.3 | 56.4 |
|
26 |
+
| WarriorCoder (6.7B) | 80.5 | 75.6 | 76.2 | 64.8 |
|