I tried out the 32B model with cleaned tau-bench repo, claude3.5 as user model and got the following results for pass@1
tau-bench
46.09, 39.13, 38.26, 40.87, 38.26 - avg 40.52
Any idea why?
· Sign up or log in to comment