====== Perplexity statistics ====== Mean PPL(Q) : 9.032577 ± 0.058532 Mean PPL(base) : 7.237090 ± 0.045539 Cor(ln(PPL(Q)), ln(PPL(base))): 95.96% Mean ln(PPL(Q)/PPL(base)) : 0.221619 ± 0.001825 Mean PPL(Q)/PPL(base) : 1.248095 ± 0.002278 Mean PPL(Q)-PPL(base) : 1.795488 ± 0.019603 ====== KL divergence statistics ====== Mean KLD: 0.204862 ± 0.000758 Maximum KLD: 9.152626 99.9% KLD: 3.342306 99.0% KLD: 1.358416 99.0% KLD: 1.358416 Median KLD: 0.143007 10.0% KLD: 0.013411 5.0% KLD: 0.004899 1.0% KLD: 0.000896 Minimum KLD: 0.000000 ====== Token probability statistics ====== Mean Δp: -4.675 ± 0.034 % Maximum Δp: 83.272% 99.9% Δp: 42.833% 99.0% Δp: 23.669% 95.0% Δp: 10.247% 90.0% Δp: 4.840% 75.0% Δp: 0.184% Median Δp: -1.059% 25.0% Δp: -7.819% 10.0% Δp: -19.392% 5.0% Δp: -28.470% 1.0% Δp: -53.110% 0.1% Δp: -86.125% Minimum Δp: -99.905% RMS Δp : 13.678 ± 0.058 % Same top p: 78.198 ± 0.109 %