Decoding strategy of the Phi4 Multimodal

#50
by Zhengyang - opened

Dear authors,
thank you for the great work. What is the decoding strategy of the phi4 multimodal? Is it beam search or topk sampling? I didn't find it in the configuration file.

Best,
Zhengyang

Hi @Zhengyang ,

For speech/audio tasks, we simply used greedy search (top-1) for the benchmark. You can try other options for more diverse output if you like.

Thanks,
Ruchao

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment