====== Perplexity statistics ====== Mean PPL(Q) : 7.553687 ± 0.047468 Mean PPL(base) : 7.237090 ± 0.045539 Cor(ln(PPL(Q)), ln(PPL(base))): 99.32% Mean ln(PPL(Q)/PPL(base)) : 0.042817 ± 0.000732 Mean PPL(Q)/PPL(base) : 1.043747 ± 0.000764 Mean PPL(Q)-PPL(base) : 0.316598 ± 0.005745 ====== KL divergence statistics ====== Mean KLD: 0.033370 ± 0.000164 Maximum KLD: 4.041112 99.9% KLD: 0.772030 99.0% KLD: 0.234367 99.0% KLD: 0.234367 Median KLD: 0.021457 10.0% KLD: 0.001426 5.0% KLD: 0.000473 1.0% KLD: 0.000068 Minimum KLD: -0.000210 ====== Token probability statistics ====== Mean Δp: -1.218 ± 0.013 % Maximum Δp: 57.472% 99.9% Δp: 21.825% 99.0% Δp: 10.690% 95.0% Δp: 4.844% 90.0% Δp: 2.564% 75.0% Δp: 0.260% Median Δp: -0.171% 25.0% Δp: -2.265% 10.0% Δp: -6.360% 5.0% Δp: -9.509% 1.0% Δp: -18.642% 0.1% Δp: -41.385% Minimum Δp: -86.813% RMS Δp : 5.188 ± 0.033 % Same top p: 91.003 ± 0.075 %