view reply It's not prompted. The source Audio had that emotional context and the model simply copied it.