⚠️ This model is lightly subtly busted in 20 different ways compared to the original. It is mostly designed for further training (that will implicitly heal it from these subtle busts). You have been warned.

This is a conversion of GPT-J-6b by EleutherAI into a more modern architecture that it still closely maps to (in this case, the Phi 1/1.5/2 architecture). This allows for, primarily, rope scaling, as well as for creating GGUFs (it does not currently support GPT-J's original arch.) See convert.py for the file used to convert the weights.

Note that I was originally going to use the GPT-NeoX architecture because it felt more befitting, there appears to be a bug in the most recent versions of Transformers, so Phi it is!

Also, the partial_rotary_factor is selected to be 0.5 here, despite the fact that this makes no logical sense, as even though it should be 0.25 (rotary_dim / head_dim = 64 / 256 = 0.25), 0.25 is completely babblingly incoherent and 0.5 is basically the same as the original. Whatever.

Downloads last month
39
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for allura-forge/phi-j-6b-edited

Finetuned
(18)
this model

Dataset used to train allura-forge/phi-j-6b-edited