File size: 5,824 Bytes

---
license: mit
base_model:
- timm/maxvit_tiny_tf_224.in1k
pipeline_tag: zero-shot-classification
datasets:
- AbstractPhil/geometric-vocab
---

# The models uploaded are no longer based on max-vit so this repo is to be archived.

The massive achievement here is the 300 kb pentachora vit that can accurately top 1 cifar 100 with 25% accuracy and top 5 at 80% accuracy is tremendous. This is a legitimate showcase and proof of concept that not only proves without a doubt that the geometry and the structural integrity will withstand large amounts of information, but that the features and CLS structure is not just semantic - but it's deterministic and repeatable.

The internal structure no longer reflects maxvit even slightly. It's far divergent and no longer houses any of the original conceptualizations that the max-vit-goliath would curtail.

If you were keeping up on the journey, know that I will not slow down. The next repo will contain the full manifest of the "penta-vit" and the vision of how the patches will function in an entirely new systemic capacity.

Thank you for your time. *bows head*

# Spark V2 - Non random pentas.

The early prototype below was from purely random pentas; meaning it wasn't using the vocabulary based on checking the saved vocabulary outputs.

The vocabulary should be uniformly matching through all of the variants.


# Updated again - Spark has variants.

It works boys n grills. We have a micro-sized geometric ViT model that works.

Now lets provide that lightning that makes the Nikola architecture truly unique - baked clean into our geometric structure with our geometric attention relay.

The current model.py contains the weights I'm training. Which makes this direct proofs for geometric structural integrity solidifying smaller structures into a much more potent shape.

Nikola's resonant formulas will assist with this one; as it took to the geometric attention built specifically for the coil architecture. Lets see how she behaves in the coming days.

Currently I'm going to run about 50 of these to see how she behaves with cifar100 and various settings.


```TEXT
Model Configuration:
  Internal dim: 100
  Vocab dim: 100
  Num classes: 100
  Crystal shape: torch.Size([100, 5, 100])
Evaluating: 100%|██████████| 100/100 [00:02<00:00, 37.96it/s]

================================================================================
EVALUATION RESULTS
================================================================================

Overall Accuracy: 53.50%
Auxiliary Head Accuracy: 52.97%

Top 10 Classes:
Class                Acc%     Conf     GeoAlign   CrystalNorm 
----------------------------------------------------------------------
wardrobe               87.0   0.703     0.829       0.308
orange                 84.0   0.708     0.839       0.298
road                   84.0   0.772     0.626       0.327
sunflower              84.0   0.749     0.756       0.260
plain                  80.0   0.692     0.763       0.306
skyscraper             80.0   0.669     0.631       0.255
apple                  78.0   0.681     0.821       0.275
cloud                  77.0   0.725     0.758       0.267
aquarium_fish          75.0   0.606     0.473       0.266
chair                  73.0   0.709     0.696       0.279

Bottom 10 Classes:
Class                Acc%     Conf     GeoAlign   CrystalNorm 
----------------------------------------------------------------------
kangaroo               33.0   0.434     0.601       0.316
man                    33.0   0.461     0.554       0.321
squirrel               33.0   0.479     0.538       0.274
woman                  33.0   0.399     0.576       0.289
boy                    31.0   0.465     0.573       0.299
bus                    31.0   0.526     0.694       0.298
possum                 31.0   0.486     0.619       0.284
lizard                 28.0   0.432     0.452       0.274
crocodile              25.0   0.408     0.481       0.310
seal                   25.0   0.441     0.475       0.325

Correlations with Accuracy:
  Geometric Alignment: 0.493
  Crystal Norm: -0.210
  Vertex Variance: -0.194
```
![image/png](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/G3cnwQ93wGEjgKdtrnPWe.png)



# Updated - Spark works.

max-vit-goliath-spark is essentially a 300k param vit that can handle nearly identical accuracy as the larger model with a shockingly robust utility of the features.

```PYTHON
'pentachora_spark': PentachoraConfig(
    dim=64, depth=5, heads=4, mlp_ratio=4.0,
    preserve_structure_until_layer=2,
    dropout_rate=0.0, drop_path_rate=0.0
),
```

64 dim vocabulary effectively trying to carry the entire vit.
It's using a particularly effective geometric attention.

The output produces effective image feature representations in geomeric format.

![image/png](https://cdn-uploads.huggingface.co/production/uploads/630cf55b15433862cfc9556f/DvJBf3cP6p2zj6P_wc7HH.png)


```
Final Results:
Best Validation Accuracy: 54.15%
Final Train Loss: 2.1262
Final Val Loss: 3.6396
```

# Original post
Currently it's only a pickled early version at about ~50% accuracy.

This one is a 12 layer 8 head variation of max-vit-goliath that trained on geometric vocab with cifar100 using a specialized 5d format. It's WORKING - somewhat, but it's definitely nothing to phone home about yet.

Dropout was used and I really don't like what it did to the internals. The math doesn't line up correctly and the shapes are all over the board. The next will be cleaner.

I've included the weights in a file for posterity as this version may be abandoned, but I want to preserve the A100 80 gig time that google sliced off for me yesterday. If that was intentional thank you, if it was random then the universe wanted thsi to exist. Either way we're here now.