Extremely high logits

#9
by Thomas2419 - opened

Hello I've found this model to have extremely high logits, and loss on new tasks because of that fact into the millions compares to Bert base, roberta, deberta, and other models I tested identically to mobile bert. Is this an intentional facet of mobilebert? It seems to render finetuning new heads onto the frozen model impossible due to instability?

Sign up or log in to comment