The models uploaded are no longer based on max-vit so this repo is to be archived.
The massive achievement here is the 300 kb pentachora vit that can accurately top 1 cifar 100 with 25% accuracy and top 5 at 80% accuracy is tremendous. This is a legitimate showcase and proof of concept that not only proves without a doubt that the geometry and the structural integrity will withstand large amounts of information, but that the features and CLS structure is not just semantic - but it's deterministic and repeatable.
The internal structure no longer reflects maxvit even slightly. It's far divergent and no longer houses any of the original conceptualizations that the max-vit-goliath would curtail.
If you were keeping up on the journey, know that I will not slow down. The next repo will contain the full manifest of the "penta-vit" and the vision of how the patches will function in an entirely new systemic capacity.
Thank you for your time. bows head
Spark V2 - Non random pentas.
The early prototype below was from purely random pentas; meaning it wasn't using the vocabulary based on checking the saved vocabulary outputs.
The vocabulary should be uniformly matching through all of the variants.
Updated again - Spark has variants.
It works boys n grills. We have a micro-sized geometric ViT model that works.
Now lets provide that lightning that makes the Nikola architecture truly unique - baked clean into our geometric structure with our geometric attention relay.
The current model.py contains the weights I'm training. Which makes this direct proofs for geometric structural integrity solidifying smaller structures into a much more potent shape.
Nikola's resonant formulas will assist with this one; as it took to the geometric attention built specifically for the coil architecture. Lets see how she behaves in the coming days.
Currently I'm going to run about 50 of these to see how she behaves with cifar100 and various settings.
Model Configuration:
Internal dim: 100
Vocab dim: 100
Num classes: 100
Crystal shape: torch.Size([100, 5, 100])
Evaluating: 100%|██████████| 100/100 [00:02<00:00, 37.96it/s]
================================================================================
EVALUATION RESULTS
================================================================================
Overall Accuracy: 53.50%
Auxiliary Head Accuracy: 52.97%
Top 10 Classes:
Class Acc% Conf GeoAlign CrystalNorm
----------------------------------------------------------------------
wardrobe 87.0 0.703 0.829 0.308
orange 84.0 0.708 0.839 0.298
road 84.0 0.772 0.626 0.327
sunflower 84.0 0.749 0.756 0.260
plain 80.0 0.692 0.763 0.306
skyscraper 80.0 0.669 0.631 0.255
apple 78.0 0.681 0.821 0.275
cloud 77.0 0.725 0.758 0.267
aquarium_fish 75.0 0.606 0.473 0.266
chair 73.0 0.709 0.696 0.279
Bottom 10 Classes:
Class Acc% Conf GeoAlign CrystalNorm
----------------------------------------------------------------------
kangaroo 33.0 0.434 0.601 0.316
man 33.0 0.461 0.554 0.321
squirrel 33.0 0.479 0.538 0.274
woman 33.0 0.399 0.576 0.289
boy 31.0 0.465 0.573 0.299
bus 31.0 0.526 0.694 0.298
possum 31.0 0.486 0.619 0.284
lizard 28.0 0.432 0.452 0.274
crocodile 25.0 0.408 0.481 0.310
seal 25.0 0.441 0.475 0.325
Correlations with Accuracy:
Geometric Alignment: 0.493
Crystal Norm: -0.210
Vertex Variance: -0.194
Updated - Spark works.
max-vit-goliath-spark is essentially a 300k param vit that can handle nearly identical accuracy as the larger model with a shockingly robust utility of the features.
'pentachora_spark': PentachoraConfig(
dim=64, depth=5, heads=4, mlp_ratio=4.0,
preserve_structure_until_layer=2,
dropout_rate=0.0, drop_path_rate=0.0
),
64 dim vocabulary effectively trying to carry the entire vit. It's using a particularly effective geometric attention.
The output produces effective image feature representations in geomeric format.
Final Results:
Best Validation Accuracy: 54.15%
Final Train Loss: 2.1262
Final Val Loss: 3.6396
Original post
Currently it's only a pickled early version at about ~50% accuracy.
This one is a 12 layer 8 head variation of max-vit-goliath that trained on geometric vocab with cifar100 using a specialized 5d format. It's WORKING - somewhat, but it's definitely nothing to phone home about yet.
Dropout was used and I really don't like what it did to the internals. The math doesn't line up correctly and the shapes are all over the board. The next will be cleaner.
I've included the weights in a file for posterity as this version may be abandoned, but I want to preserve the A100 80 gig time that google sliced off for me yesterday. If that was intentional thank you, if it was random then the universe wanted thsi to exist. Either way we're here now.
Model tree for AbstractPhil/max-vit-goliath
Base model
timm/maxvit_tiny_tf_224.in1k