Decoding strategy of the Phi4 Multimodal
#50
by
Zhengyang
- opened
Dear authors,
thank you for the great work. What is the decoding strategy of the phi4 multimodal? Is it beam search or topk sampling? I didn't find it in the configuration file.
Best,
Zhengyang
Hi @Zhengyang ,
For speech/audio tasks, we simply used greedy search (top-1) for the benchmark. You can try other options for more diverse output if you like.
Thanks,
Ruchao