Update README.md
Browse files
README.md
CHANGED
|
@@ -41,7 +41,7 @@ Go to [https://github.com/JetBrains-Research/big-rocq](https://github.com/JetBra
|
|
| 41 |
### Training Procedure
|
| 42 |
|
| 43 |
* **Objective:** InfoNCE
|
| 44 |
-
* **Batch size:**
|
| 45 |
* **Optimizer / LR:** AdamW, lr = 4e‑6, linear warm‑up 10 %, 22k steps
|
| 46 |
* **Hardware:** 1× NVIDIA H100 GPU, 160 GB RAM, 14 h wall‑clock
|
| 47 |
|
|
@@ -52,17 +52,6 @@ Go to [https://github.com/JetBrains-Research/big-rocq](https://github.com/JetBra
|
|
| 52 |
* **Dataset:** IMM‑300 (300 Rocq theorems) from the IMM project
|
| 53 |
* **Metrics:** Downstream proof success rate of CoqPilot when given top‑7 retrieved premises; averaged over 12 generations.
|
| 54 |
|
| 55 |
-
### Results
|
| 56 |
-
|
| 57 |
-
| Model (back‑end) | Bucket | Baseline Jaccard | **RocqStar** |
|
| 58 |
-
| ---------------- | ------ | ---------------- | ------------ |
|
| 59 |
-
| GPT‑4o | ≤ 4 | 48 ± 5 % | **51 ± 5 %** |
|
| 60 |
-
| GPT‑4o | 5–8 | 18 ± 4 % | **25 ± 3 %** |
|
| 61 |
-
| GPT‑4o | 9–20 | 11 ± 4 % | 11 ± 5 % |
|
| 62 |
-
| Claude 3.5 | ≤ 4 | 58 ± 5 % | **61 ± 4 %** |
|
| 63 |
-
| Claude 3.5 | 5–8 | 28 ± 5 % | **36 ± 5 %** |
|
| 64 |
-
| Claude 3.5 | 9–20 | 16 ± 5 % | **21 ± 5 %** |
|
| 65 |
-
|
| 66 |
#### Summary
|
| 67 |
|
| 68 |
RocqStar delivers consistent gains, up to 28% relative improvement over Jaccard-index based retrieval, especially for medium‑length theorems where proof similarity diverges most from statement similarity.
|
|
|
|
| 41 |
### Training Procedure
|
| 42 |
|
| 43 |
* **Objective:** InfoNCE
|
| 44 |
+
* **Batch size:** 16
|
| 45 |
* **Optimizer / LR:** AdamW, lr = 4e‑6, linear warm‑up 10 %, 22k steps
|
| 46 |
* **Hardware:** 1× NVIDIA H100 GPU, 160 GB RAM, 14 h wall‑clock
|
| 47 |
|
|
|
|
| 52 |
* **Dataset:** IMM‑300 (300 Rocq theorems) from the IMM project
|
| 53 |
* **Metrics:** Downstream proof success rate of CoqPilot when given top‑7 retrieved premises; averaged over 12 generations.
|
| 54 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
#### Summary
|
| 56 |
|
| 57 |
RocqStar delivers consistent gains, up to 28% relative improvement over Jaccard-index based retrieval, especially for medium‑length theorems where proof similarity diverges most from statement similarity.
|