====== Perplexity statistics ====== Mean PPL(Q) : 9.855009 ± 0.061412 Mean PPL(base) : 7.237090 ± 0.045539 Cor(ln(PPL(Q)), ln(PPL(base))): 94.48% Mean ln(PPL(Q)/PPL(base)) : 0.308761 ± 0.002082 Mean PPL(Q)/PPL(base) : 1.361736 ± 0.002836 Mean PPL(Q)-PPL(base) : 2.617919 ± 0.023684 ====== KL divergence statistics ====== Mean KLD: 0.292188 ± 0.000927 Maximum KLD: 8.990653 99.9% KLD: 3.324761 99.0% KLD: 1.855327 99.0% KLD: 1.855327 Median KLD: 0.215862 10.0% KLD: 0.020007 5.0% KLD: 0.005427 1.0% KLD: 0.000564 Minimum KLD: 0.000000 ====== Token probability statistics ====== Mean Δp: -8.052 ± 0.043 % Maximum Δp: 83.209% 99.9% Δp: 40.828% 99.0% Δp: 21.532% 95.0% Δp: 8.004% 90.0% Δp: 2.972% 75.0% Δp: 0.009% Median Δp: -2.217% 25.0% Δp: -12.532% 10.0% Δp: -27.827% 5.0% Δp: -40.790% 1.0% Δp: -72.837% 0.1% Δp: -88.971% Minimum Δp: -99.209% RMS Δp : 18.125 ± 0.066 % Same top p: 74.435 ± 0.115 %