According to 1M AI News monitoring, Microsoft has released a desktop speech input tool called Vibing, based on its in-house open-source speech AI model VibeVoice. It supports macOS and Windows, and is free to use. By pressing a shortcut key (Right Option on Mac / Ctrl+Win on Windows), you can trigger recording in any application, and the text is automatically generated after the recording ends. AI/ML community evaluator @realmrfakename, after testing, described the transcription as accurate and fast, rating it as a "free alternative to WisprFlow".
Vibing is more than just speech-to-text. It rewrites spoken language into written text suitable for the current context using LLM. It allows natural language modifications, deletions, and organization of existing content during input. Other features include continuous recording of over 5 minutes, automatic recognition of 50+ languages, mixed Chinese and English input within the same sentence, custom hotwords, and real-time translation.
The underlying Microsoft VibeVoice is a family of open-source speech AI models under the MIT license, with over 28,000 GitHub stars. It includes a 7B parameter ASR model (processing 60 minutes of audio in a single go), a 1.5B TTS model (generating 90 minutes of multi-speaker speech), and a 0.5B real-time model (300ms latency). WisprFlow is currently one of the most popular AI speech input tools on Mac, with a monthly subscription fee. Vibing enters the same field directly through a free and open-source approach.
