This is a conversion of GPT-J-6b by EleutherAI into a more modern architecture that it still closely maps to (in this case, the Phi 1/1.5/2 architecture). This allows for, primarily, rope scaling, as well as for creating GGUFs (it does not currently support GPT-J's original arch.) See convert.py
for the file used to convert the weights.
Note that I was originally going to use the GPT-NeoX architecture because it felt more befitting, there appears to be a bug in the most recent versions of Transformers, so Phi it is!
Also, the partial_rotary_factor
is selected to be 0.5
here, despite the fact that this makes no logical sense, as even though it should be 0.25
(rotary_dim / head_dim
= 64 / 256
= 0.25
), 0.25
is completely babblingly incoherent and 0.5
is basically the same as the original. Whatever.
- Downloads last month
- 39
Model tree for allura-forge/phi-j-6b-edited
Base model
EleutherAI/gpt-j-6b