view reply isn't "in context learning rate" is about training dynamics rather than the model architecture itself?
view post Post 3216 Just made this!https://github.com/takeraparterer/Charformer 20 replies Β· π 4 4 β€οΈ 3 3 π₯ 2 2 π 1 1 π 1 1 + Reply