On-demand audio transcription is an often-requested service without many good options on the market.
Using Hugging Face Spaces with Gradio SDK and the OpenAI Whisper model, I've put together a simple interface that supports the transcription and summarisation of audio files up to five minutes in length, completely open source and running on CPU upgrade. The cool thing is that it's built without a dedicated inference endpoint, completely on public infrastructure.
Training a model to reason in the continuous latent space based on Meta's Coconut. If it all works will apply it on the MiniCPM-o SVD-LR. Endgame is a multimodal, adaptive, and efficient foundational on device AI model.
Qwen 2.5 Coder 32b is a dime among nickels. Amazing performance for its size, so much so it earns a spot in the duo leaderboard. The day of small models is here.