====== Perplexity statistics ====== Mean PPL(Q) : 8.253598 ± 0.051864 Mean PPL(base) : 7.237090 ± 0.045539 Cor(ln(PPL(Q)), ln(PPL(base))): 97.71% Mean ln(PPL(Q)/PPL(base)) : 0.131430 ± 0.001346 Mean PPL(Q)/PPL(base) : 1.140458 ± 0.001535 Mean PPL(Q)-PPL(base) : 1.016508 ± 0.012175 ====== KL divergence statistics ====== Mean KLD: 0.117565 ± 0.000433 Maximum KLD: 7.079286 99.9% KLD: 1.966468 99.0% KLD: 0.726076 99.0% KLD: 0.726076 Median KLD: 0.084699 10.0% KLD: 0.006988 5.0% KLD: 0.002383 1.0% KLD: 0.000330 Minimum KLD: -0.000001 ====== Token probability statistics ====== Mean Δp: -3.685 ± 0.026 % Maximum Δp: 69.513% 99.9% Δp: 34.570% 99.0% Δp: 17.585% 95.0% Δp: 7.273% 90.0% Δp: 3.369% 75.0% Δp: 0.113% Median Δp: -0.833% 25.0% Δp: -6.212% 10.0% Δp: -15.079% 5.0% Δp: -21.666% 1.0% Δp: -38.754% 0.1% Δp: -69.188% Minimum Δp: -97.122% RMS Δp : 10.385 ± 0.045 % Same top p: 82.770 ± 0.100 %