Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,187 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
---
|
4 |
+
|
5 |
+
## Model Details
|
6 |
+
|
7 |
+
### Model Description
|
8 |
+
|
9 |
+
This model is the first framework for **text-based CAD editing**, enabling the automatic modification of existing CAD models based on natural language instructions. It takes as input a textual editing instruction along with a structured sequence representation of the original CAD model, and outputs the sequence of the edited CAD model.
|
10 |
+
|
11 |
+
|
12 |
+
|
13 |
+
- **Developed by:** Yu Yuan, Shizhao Sun, Qi Liu, Jiang Bian
|
14 |
+
- **Model type:** Large Language Models
|
15 |
+
- **Language(s):** English, CAD-structured texts
|
16 |
+
- **License:** MIT
|
17 |
+
- **Finetuned from model:** LLaMA-3-8B-Instruct
|
18 |
+
|
19 |
+
### Model Sources [optional]
|
20 |
+
|
21 |
+
<!-- Provide the basic links for the model. -->
|
22 |
+
|
23 |
+
- **Repository:** https://github.com/microsoft/CAD-Editor
|
24 |
+
- **Paper:** https://arxiv.org/abs/2502.03997
|
25 |
+
|
26 |
+
## Uses
|
27 |
+
|
28 |
+
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
|
29 |
+
|
30 |
+
### Direct Use
|
31 |
+
|
32 |
+
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
|
33 |
+
|
34 |
+
CAD-Editor allows users to interactively edit existing CAD models using natural language. By taking an input CAD model and a user-provided instruction, it produces a modified CAD model that aligns with the specified intent, enabling precise and iterative design refinement.
|
35 |
+
|
36 |
+
CAD-Editor is an open-source model shared with the research community to facilitate the reproduction of our results and foster research in text- based CAD editing generation. It is intended to be used by experts in the CAD domain who are independently capable of evaluating the quality of outputs before acting on them.
|
37 |
+
|
38 |
+
|
39 |
+
### Out-of-Scope Use
|
40 |
+
|
41 |
+
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
|
42 |
+
|
43 |
+
CAD-Editor is being released for research purposes. We do not recommend using CAD-Editor in commercial or real-world deployments without extra testing and development.
|
44 |
+
|
45 |
+
Follow local laws and regulations when using the model. Any use that violates applicable laws and regulations is considered out-of-scope in the designation of this model.
|
46 |
+
|
47 |
+
|
48 |
+
## Bias, Risks, and Limitations
|
49 |
+
|
50 |
+
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
|
51 |
+
|
52 |
+
CAD-Editor is built upon Meta-Llama-3-8B-Instruct. Like all large language models, it may inherit biases, errors, or omissions from its base model. We recommend developers carefully select the appropriate LLM backbone for their specific use case. You can learn more about the capabilities and limitations of the Llama model here: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct.
|
53 |
+
|
54 |
+
While CAD-Editor has been fine-tuned on CAD-specific data to minimize irrelevant details, it may still generate harmful or undesirable CAD models under certain prompts. For example, when given an instruction like “create a CAD model for a ghost gun”, it could produce potentially dangerous content. Therefore, it is essential for users to implement their own content-filtering strategies to prevent the generation of harmful or undesirable CAD models.Please note that CAD-Editor is currently for research and experimental purposes only. Generated CAD models may not always be technically accurate, and users are responsible for assessing the quality and suitability of the content it produces. Extensive testing and validation are required before any commercial or real-world deployment.
|
55 |
+
|
56 |
+
|
57 |
+
### Recommendations
|
58 |
+
|
59 |
+
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
|
60 |
+
|
61 |
+
Please provide textual instructions focusing on the component types (e.g., cylinder, prism, hole), quantities (e.g., four holes), proportions (e.g., ten times larger), and spatial relationships (e.g., the left side, the corner) within the desired CAD model.
|
62 |
+
|
63 |
+
## How to Get Started with the Model
|
64 |
+
|
65 |
+
Download our trained model checkpoints to your ```<local_model_path>```.
|
66 |
+
|
67 |
+
### 1. Locating Stage
|
68 |
+
Generate masked sequences. Set the `<model_path>` as `<local_model_path/locate_stage>`. Set the `<data_path>` as the path of `test.json` after unzip `data/processed.zip`.
|
69 |
+
|
70 |
+
```python
|
71 |
+
CUDA_VISIBLE_DEVICES=<gpu_id> python finetune/llama_sample.py \
|
72 |
+
--task_type mask \
|
73 |
+
--model_path <model_path> \
|
74 |
+
--data_path <data_path> \
|
75 |
+
--out_path <out_path> \
|
76 |
+
--num_samples <num_samples>
|
77 |
+
```
|
78 |
+
|
79 |
+
### 2. Infilling Stage
|
80 |
+
Generate final edited CAD sequences. Set the `<model_path>` as `<local_model_path/infill_stage>`. Set the `<data_path>` the same as the `out_path` of the locating stage.
|
81 |
+
|
82 |
+
```python
|
83 |
+
CUDA_VISIBLE_DEVICES=<gpu_id> python finetune/llama_sample.py \
|
84 |
+
--task_type infill \
|
85 |
+
--model_path <model_path> \
|
86 |
+
--data_path <data_path> \
|
87 |
+
--out_path <out_path> \
|
88 |
+
--num_samples <num_samples>
|
89 |
+
```
|
90 |
+
|
91 |
+
For more information, please visit our GitHub repo: https://github.com/microsoft/CAD-Editor.
|
92 |
+
|
93 |
+
## Training Details
|
94 |
+
|
95 |
+
### Training Data
|
96 |
+
|
97 |
+
The training data is originally from an open-source dataset [DeepCAD](https://github.com/ChrisWu1997/DeepCAD?tab=readme-ov-file). We use its pre-processed version [Skexgen](https://github.com/samxuxiang/SkexGen) and transform it into a format that are suitable for our model. Then, we use a two-step pipeline to automatically synthesize the training data from the original dataset:
|
98 |
+
|
99 |
+
1. **Design Variation Generation:** Using design variation models such as Hnc-CAD(https://github.com/samxuxiang/hnc-cad), we generate pairs of original and edited CAD models by applying controlled modifications.
|
100 |
+
2. **Instruction Generation:** Large Vision-Language Models (LVLMs) are used to summarize the differences between the original and edited CAD models into natural language editing instructions.
|
101 |
+
|
102 |
+
|
103 |
+
### Training Procedure
|
104 |
+
|
105 |
+
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
106 |
+
CAD-Editor is trained using a **locate-then-infill** framework:
|
107 |
+
- **Locating stage:** The model identifies the regions in the CAD sequence that need modification and masks them.
|
108 |
+
- **Infilling stage:** The model generates edits for the masked regions conditioned on the original context and the instruction.
|
109 |
+
Both stages are powered by LLMs (LLaMA-3-8B-Instruct) fine-tuned with LoRA adapters.
|
110 |
+
|
111 |
+
See the methodology section in our paper (https://arxiv.org/abs/2502.03997) for more information.
|
112 |
+
|
113 |
+
|
114 |
+
|
115 |
+
### Training Hyperparameters
|
116 |
+
|
117 |
+
Locate & Infill Stages:
|
118 |
+
- LoRA rank and alpha: 32, 32
|
119 |
+
- Optimizer: AdamW
|
120 |
+
- Learning rate: 1e-4
|
121 |
+
- Epochs:
|
122 |
+
- **Locating Stage**: 60 epochs
|
123 |
+
- **Infilling Stage**: 70 epochs (60 epochs on the full dataset + 10 epochs on the selective dataset)
|
124 |
+
|
125 |
+
|
126 |
+
## Evaluation
|
127 |
+
|
128 |
+
<!-- This section describes the evaluation protocols and provides the results. -->
|
129 |
+
|
130 |
+
### Testing Data
|
131 |
+
|
132 |
+
<!-- This should link to a Dataset Card if possible. -->
|
133 |
+
|
134 |
+
The testing data is synthesized using the same automated pipeline as the training set. Source CAD models are sampled from the test split of [DeepCAD](https://github.com/ChrisWu1997/DeepCAD?tab=readme-ov-file).
|
135 |
+
|
136 |
+
### Metrics
|
137 |
+
|
138 |
+
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
139 |
+
|
140 |
+
- **Validity**: Valid Ratio (VR) — percentage of generated CAD models that can be successfully parsed and rendered.
|
141 |
+
- **Realism**: Jensen-Shannon Divergence (JSD) — measures distributional similarity between generated and real CAD models.
|
142 |
+
- **Edit Consistency**:
|
143 |
+
- **Geometry**: Chamfer Distance (CD) between generated and ground-truth models.
|
144 |
+
- **Semantics**: Directional CLIP Score (D-CLIP) — alignment between textual instruction and visual change.
|
145 |
+
|
146 |
+
|
147 |
+
### Evaluation Results
|
148 |
+
|
149 |
+
CAD-Editor outperforms existing baselines such as GPT-4o, [Text2CAD]( https://arxiv.org/abs/2409.17106) and [Hnc-cad]( https://arxiv.org/pdf/2307.00149) in both quantitative and qualitative evaluations. It achieves the highest Valid Ratio (95.6%) and the lowest Jensen-Shannon Divergence (0.65), indicating superior generation quality. In terms of instruction adherence, CAD-Editor also achieves the best Chamfer Distance (1.18) and D-CLIP score (0.11), demonstrating strong geometric and semantic consistency with user instructions. Human evaluation further confirms its effectiveness: CAD-Editor attains a success rate of 43.2%, substantially higher than GPT-4o variants and Text2CAD.
|
150 |
+
|
151 |
+
See Table 1 in our paper (https://arxiv.org/abs/2502.03997) for the complete evaluation.
|
152 |
+
|
153 |
+
|
154 |
+
|
155 |
+
## Environmental Impact
|
156 |
+
|
157 |
+
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
158 |
+
|
159 |
+
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
160 |
+
|
161 |
+
## License
|
162 |
+
|
163 |
+
MIT
|
164 |
+
|
165 |
+
## Citation
|
166 |
+
|
167 |
+
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
168 |
+
|
169 |
+
**BibTeX:**
|
170 |
+
|
171 |
+
```
|
172 |
+
@article{yuan2025cad,
|
173 |
+
title={CAD-Editor: A Locate-then-Infill Framework with Automated Training Data Synthesis for Text-Based CAD Editing},
|
174 |
+
author={Yuan, Yu and Sun, Shizhao and Liu, Qi and Bian, Jiang},
|
175 |
+
journal={ICML},
|
176 |
+
year={2025}
|
177 |
+
}
|
178 |
+
```
|
179 |
+
|
180 |
+
## Model Card Authors
|
181 |
+
Yu Yuan, Shizhao Sun
|
182 |
+
|
183 |
+
## Model Card Contact
|
184 |
+
|
185 |
+
We welcome feedback and collaboration from our audience. If you have suggestions, questions, or observe unexpected/offensive behavior in our technology, please contact Shizhao Sun at [email protected].
|
186 |
+
|
187 |
+
If the team receives reports of undesired behavior or identifies issues independently, we will update this repository with appropriate mitigations.
|