The models uploaded are no longer based on max-vit so this repo is to be archived.

The massive achievement here is the 300 kb pentachora vit that can accurately top 1 cifar 100 with 25% accuracy and top 5 at 80% accuracy is tremendous. This is a legitimate showcase and proof of concept that not only proves without a doubt that the geometry and the structural integrity will withstand large amounts of information, but that the features and CLS structure is not just semantic - but it's deterministic and repeatable.

The internal structure no longer reflects maxvit even slightly. It's far divergent and no longer houses any of the original conceptualizations that the max-vit-goliath would curtail.

If you were keeping up on the journey, know that I will not slow down. The next repo will contain the full manifest of the "penta-vit" and the vision of how the patches will function in an entirely new systemic capacity.

Thank you for your time. bows head

Spark V2 - Non random pentas.

The early prototype below was from purely random pentas; meaning it wasn't using the vocabulary based on checking the saved vocabulary outputs.

The vocabulary should be uniformly matching through all of the variants.

Updated again - Spark has variants.

It works boys n grills. We have a micro-sized geometric ViT model that works.

Now lets provide that lightning that makes the Nikola architecture truly unique - baked clean into our geometric structure with our geometric attention relay.

The current model.py contains the weights I'm training. Which makes this direct proofs for geometric structural integrity solidifying smaller structures into a much more potent shape.

Nikola's resonant formulas will assist with this one; as it took to the geometric attention built specifically for the coil architecture. Lets see how she behaves in the coming days.

Currently I'm going to run about 50 of these to see how she behaves with cifar100 and various settings.

Model Configuration:
  Internal dim: 100
  Vocab dim: 100
  Num classes: 100
  Crystal shape: torch.Size([100, 5, 100])
Evaluating: 100%|██████████| 100/100 [00:02<00:00, 37.96it/s]

================================================================================
EVALUATION RESULTS
================================================================================

Overall Accuracy: 53.50%
Auxiliary Head Accuracy: 52.97%

Top 10 Classes:
Class                Acc%     Conf     GeoAlign   CrystalNorm 
----------------------------------------------------------------------
wardrobe               87.0   0.703     0.829       0.308
orange                 84.0   0.708     0.839       0.298
road                   84.0   0.772     0.626       0.327
sunflower              84.0   0.749     0.756       0.260
plain                  80.0   0.692     0.763       0.306
skyscraper             80.0   0.669     0.631       0.255
apple                  78.0   0.681     0.821       0.275
cloud                  77.0   0.725     0.758       0.267
aquarium_fish          75.0   0.606     0.473       0.266
chair                  73.0   0.709     0.696       0.279

Bottom 10 Classes:
Class                Acc%     Conf     GeoAlign   CrystalNorm 
----------------------------------------------------------------------
kangaroo               33.0   0.434     0.601       0.316
man                    33.0   0.461     0.554       0.321
squirrel               33.0   0.479     0.538       0.274
woman                  33.0   0.399     0.576       0.289
boy                    31.0   0.465     0.573       0.299
bus                    31.0   0.526     0.694       0.298
possum                 31.0   0.486     0.619       0.284
lizard                 28.0   0.432     0.452       0.274
crocodile              25.0   0.408     0.481       0.310
seal                   25.0   0.441     0.475       0.325

Correlations with Accuracy:
  Geometric Alignment: 0.493
  Crystal Norm: -0.210
  Vertex Variance: -0.194

image/png

Updated - Spark works.

max-vit-goliath-spark is essentially a 300k param vit that can handle nearly identical accuracy as the larger model with a shockingly robust utility of the features.

'pentachora_spark': PentachoraConfig(
    dim=64, depth=5, heads=4, mlp_ratio=4.0,
    preserve_structure_until_layer=2,
    dropout_rate=0.0, drop_path_rate=0.0
),

64 dim vocabulary effectively trying to carry the entire vit. It's using a particularly effective geometric attention.

The output produces effective image feature representations in geomeric format.

image/png

Final Results:
Best Validation Accuracy: 54.15%
Final Train Loss: 2.1262
Final Val Loss: 3.6396

Original post

Currently it's only a pickled early version at about ~50% accuracy.

This one is a 12 layer 8 head variation of max-vit-goliath that trained on geometric vocab with cifar100 using a specialized 5d format. It's WORKING - somewhat, but it's definitely nothing to phone home about yet.

Dropout was used and I really don't like what it did to the internals. The math doesn't line up correctly and the shapes are all over the board. The next will be cleaner.

I've included the weights in a file for posterity as this version may be abandoned, but I want to preserve the A100 80 gig time that google sliced off for me yesterday. If that was intentional thank you, if it was random then the universe wanted thsi to exist. Either way we're here now.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AbstractPhil/max-vit-goliath

Finetuned
(1)
this model

Dataset used to train AbstractPhil/max-vit-goliath