Push-to-talk voice-to-text that works on any desktop. Hold a key, speak, release. Your words appear at the cursor.
Voxtype is an open-source, push-to-talk voice-to-text application designed specifically for Linux and macOS. It allows users to dictate text directly into any application by holding a key, speaking, and then releasing the key to see their words appear at the cursor.
The principal functionalities of this tool include:
Push-to-Talk: A natural workflow where users hold a key to record and release it to transcribe. This method avoids "wake words" and accidental activations.
Smart Output: It types directly at the cursor via tools like wtype or ydotool, featuring full CJK/Unicode support, and can fall back to the clipboard if needed.
Highly Configurable: Users can choose their own hotkeys, output modes, and specific Whisper model sizes (from tiny to large-v3) to balance speed and accuracy.
Fully Offline: All speech recognition happens locally on the user's machine using whisper.cpp. No voice data is sent to the cloud, ensuring total privacy.
GPU Acceleration: Built in Rust for high performance, it offers optional support for Vulkan, CUDA, Metal, and ROCm, enabling sub-second transcription on modern hardware.
Lightweight Design: It is distributed as a single binary with minimal dependencies, requiring no Python or complex virtual environments.
Wayland Optimized: It features native support for compositors like Hyprland, Sway, and River, allowing users to use native compositor keybindings without requiring a special input group. It also remains compatible with X11, GNOME, and KDE.
Visual Feedback: It includes optional Waybar integration with 10 built-in icon themes to show recording status, as well as desktop notifications via notify-send.
LLM Post-Processing: Transcriptions can be piped through local LLMs for tasks like translation, applying domain-specific vocabulary, or custom workflows.
Remote Offloading: Users can offload transcription tasks to a self-hosted remote GPU server to save resources on their local machine, though cloud API connections are also an option.
Multilingual Support: Beyond standard Whisper models, it supports various ONNX engines specialized for different languages and dialects, including Chinese, Japanese, and Korean.