====== Perplexity statistics ====== Mean PPL(Q) : 8.274172 ± 0.052402 Mean PPL(base) : 7.237090 ± 0.045539 Cor(ln(PPL(Q)), ln(PPL(base))): 97.60% Mean ln(PPL(Q)/PPL(base)) : 0.133920 ± 0.001382 Mean PPL(Q)/PPL(base) : 1.143301 ± 0.001580 Mean PPL(Q)-PPL(base) : 1.037082 ± 0.012706 ====== KL divergence statistics ====== Mean KLD: 0.114738 ± 0.000483 Maximum KLD: 9.999102 99.9% KLD: 2.236693 99.0% KLD: 0.781076 99.0% KLD: 0.781076 Median KLD: 0.077728 10.0% KLD: 0.005170 5.0% KLD: 0.001727 1.0% KLD: 0.000289 Minimum KLD: -0.000055 ====== Token probability statistics ====== Mean Δp: -3.288 ± 0.025 % Maximum Δp: 65.548% 99.9% Δp: 32.662% 99.0% Δp: 17.193% 95.0% Δp: 7.509% 90.0% Δp: 3.610% 75.0% Δp: 0.176% Median Δp: -0.636% 25.0% Δp: -5.421% 10.0% Δp: -13.956% 5.0% Δp: -20.435% 1.0% Δp: -38.546% 0.1% Δp: -71.826% Minimum Δp: -98.746% RMS Δp : 10.050 ± 0.048 % Same top p: 83.354 ± 0.098 %