Update model config and README
Browse files- README.md +7 -6
- config.json +1 -1
README.md
CHANGED
@@ -1,5 +1,6 @@
|
|
1 |
---
|
2 |
tags:
|
|
|
3 |
- timm
|
4 |
- transformers
|
5 |
pipeline_tag: image-feature-extraction
|
@@ -10,7 +11,7 @@ license_link: https://ai.meta.com/resources/models-and-libraries/dinov3-license
|
|
10 |
datasets:
|
11 |
- lvd-1689m
|
12 |
---
|
13 |
-
# Model card for vit_small_plus_patch16_dinov3_qkvb.
|
14 |
|
15 |
A DINOv3 ViT model image feature encoder. Distilled on LVD-1689M from the DINOv3 ViT-7B model.
|
16 |
|
@@ -19,7 +20,7 @@ A DINOv3 ViT model image feature encoder. Distilled on LVD-1689M from the DINOv3
|
|
19 |
* The original models keep RoPE periods as a persistent `bfloat16` buffer. `timm` generates `float32` periods at init. This results in some numerical differences, however the `timm` approach should be less problematic running on devices without bfloat16 support, and appears to work as well if not slightly better for fine-tuning. `model.rope.periods = model.rope.periods.to(torch.bfloat16).to(torch.float32)` will truncate the periods to bfloat16 and result in matching outputs.
|
20 |
|
21 |
## Model Details
|
22 |
-
- **Model Type:** Image
|
23 |
- **Model Stats:**
|
24 |
- Params (M): 28.7
|
25 |
- GMACs: 8.1
|
@@ -44,7 +45,7 @@ img = Image.open(urlopen(
|
|
44 |
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
|
45 |
))
|
46 |
|
47 |
-
model = timm.create_model('vit_small_plus_patch16_dinov3_qkvb.
|
48 |
model = model.eval()
|
49 |
|
50 |
# get model specific transforms (normalization, resize)
|
@@ -67,7 +68,7 @@ img = Image.open(urlopen(
|
|
67 |
))
|
68 |
|
69 |
model = timm.create_model(
|
70 |
-
'vit_small_plus_patch16_dinov3_qkvb.
|
71 |
pretrained=True,
|
72 |
features_only=True,
|
73 |
)
|
@@ -100,7 +101,7 @@ img = Image.open(urlopen(
|
|
100 |
))
|
101 |
|
102 |
model = timm.create_model(
|
103 |
-
'vit_small_plus_patch16_dinov3_qkvb.
|
104 |
pretrained=True,
|
105 |
num_classes=0, # remove classifier nn.Linear
|
106 |
)
|
@@ -190,4 +191,4 @@ See the associated paper for details on the evaluation protocols
|
|
190 |
doi = {10.5281/zenodo.4414861},
|
191 |
howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
|
192 |
}
|
193 |
-
```
|
|
|
1 |
---
|
2 |
tags:
|
3 |
+
- image-feature-extraction
|
4 |
- timm
|
5 |
- transformers
|
6 |
pipeline_tag: image-feature-extraction
|
|
|
11 |
datasets:
|
12 |
- lvd-1689m
|
13 |
---
|
14 |
+
# Model card for vit_small_plus_patch16_dinov3_qkvb.lvd_1689m
|
15 |
|
16 |
A DINOv3 ViT model image feature encoder. Distilled on LVD-1689M from the DINOv3 ViT-7B model.
|
17 |
|
|
|
20 |
* The original models keep RoPE periods as a persistent `bfloat16` buffer. `timm` generates `float32` periods at init. This results in some numerical differences, however the `timm` approach should be less problematic running on devices without bfloat16 support, and appears to work as well if not slightly better for fine-tuning. `model.rope.periods = model.rope.periods.to(torch.bfloat16).to(torch.float32)` will truncate the periods to bfloat16 and result in matching outputs.
|
21 |
|
22 |
## Model Details
|
23 |
+
- **Model Type:** Image Feature Encoder
|
24 |
- **Model Stats:**
|
25 |
- Params (M): 28.7
|
26 |
- GMACs: 8.1
|
|
|
45 |
'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
|
46 |
))
|
47 |
|
48 |
+
model = timm.create_model('vit_small_plus_patch16_dinov3_qkvb.lvd_1689m', pretrained=True)
|
49 |
model = model.eval()
|
50 |
|
51 |
# get model specific transforms (normalization, resize)
|
|
|
68 |
))
|
69 |
|
70 |
model = timm.create_model(
|
71 |
+
'vit_small_plus_patch16_dinov3_qkvb.lvd_1689m',
|
72 |
pretrained=True,
|
73 |
features_only=True,
|
74 |
)
|
|
|
101 |
))
|
102 |
|
103 |
model = timm.create_model(
|
104 |
+
'vit_small_plus_patch16_dinov3_qkvb.lvd_1689m',
|
105 |
pretrained=True,
|
106 |
num_classes=0, # remove classifier nn.Linear
|
107 |
)
|
|
|
191 |
doi = {10.5281/zenodo.4414861},
|
192 |
howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
|
193 |
}
|
194 |
+
```
|
config.json
CHANGED
@@ -4,7 +4,7 @@
|
|
4 |
"num_features": 384,
|
5 |
"global_pool": "avg",
|
6 |
"pretrained_cfg": {
|
7 |
-
"tag": "
|
8 |
"custom_load": false,
|
9 |
"input_size": [
|
10 |
3,
|
|
|
4 |
"num_features": 384,
|
5 |
"global_pool": "avg",
|
6 |
"pretrained_cfg": {
|
7 |
+
"tag": "lvd_1689m",
|
8 |
"custom_load": false,
|
9 |
"input_size": [
|
10 |
3,
|