Skip to content
Case Study · AI Product · macOS Native

TYPERR

I designed and built a voice workflow layer for macOS — not another AI chat, but an invisible tool that turns speech into action without leaving your current app.

Three separate tools — dictation, translation, AI prompt generation — collapsed into one push-to-talk gesture. A menu bar utility that lives in the background, respects your workflow, and delivers results directly where you need them.

Role

Product Designer + Engineer

Platform

macOS · Menu Bar

Stack

Swift · Local AI Model · External LLM API

Type

AI-Native Utility · 0→1

3→1
Tools Replaced
1
Input Gesture
3
Output Modes
TYPERR main window — voice workflow interface with prompt result panel

The app overall has no UI but it's Settings page does.

TYPERR working application status interface

And working application status does. When you call it.

I Eliminated the Tool Switching Problem

Users with AI-heavy workflows cycle through an exhausting chain: dictate in one app, translate in another, prompt in a third, then paste into the actual workspace. Each context switch creates cognitive overhead, breaks flow, and increases error risk.

The real problem wasn't any single tool — it was the gaps between them, the copy-paste overhead, the context loss, and the mental cost of deciding which app to open first.

Too Many Micro-Tools per Thought

Before TYPERR, a single voice-driven task required 6 steps: formulate thought, get transcription, translate if needed, rewrite as prompt, paste into target app, manually fix the result.

Gap Between Voice and Deliverable

Most voice products stop at raw transcription. The real user task is never "get raw text" — it's insert a ready phrase, translate speech, generate an email reply, or create a prompt.

Distrust of the AI Layer

AI tools have a core UX problem: users don't understand when local logic fired vs cloud API, why the result looks this way, or what to do when something breaks.

My Mandate

Design and build a macOS utility that makes voice a natural layer over any application — not a destination you have to visit.

Product Design Interaction Design Native Development Voice UX AI Orchestration Desktop UX

I Reframed Voice as a Workflow Layer

The initial task looked simple: let users speak instead of typing. But I reframed the product from "speech-to-text app" to "desktop voice workflow system" — shifting focus from raw ASR to orchestrating scenarios across the OS.

This reframing changed everything: from just transcription to scenario routing, from a standalone window to ambient background presence, from one output to three purpose-specific delivery surfaces.

↳ Core Hypothesis

Voice UX becomes truly valuable when it's embedded in the work flow and doesn't force the user to switch between multiple tools. One input gesture, one mental model, three outcomes depending on context and mode.

Instead of "open app for dictation, another for translation, third for prompts, fourth to paste the result" — TYPERR offers one gesture that turns voice from a separate feature into a universal work layer over macOS.

I Researched the Market and Found No Real Alternative

Before building, I audited every comparable macOS voice tool on the market. Every single competitor requires a paid subscription, comes with platform-specific limitations, and none offer the combined feature set TYPERR delivers.

This research confirmed the opportunity: there was no tool that unified dictation, translation, and AI prompt generation in a single ambient interface with local-first processing.

Whisper Transcription

Transcription Only

Handles dictation well but stops at raw text — no translation, no AI processing, no prompt generation.

Paid subscription · Single function · No workflow integration
MacWhisper

File-Based Workflow

Requires importing audio files rather than real-time push-to-talk capture. Designed for post-processing, not live workflow.

Paid subscription · Not real-time · File import model
Superwhisper

Closest Competitor

Offers real-time dictation with AI enhancement, but lacks translation mode and doesn't provide a dedicated prompt panel with regeneration and refinement.

Paid subscription · No translation · No prompt workflow
Apple Dictation

Built-in but Limited

Free and system-level, but limited to basic dictation with no AI layer, no translation, no prompt generation, and no customization.

No AI · No translation · No extensibility
Wispr Flow

Polished but Subscription-Locked

Strong dictation with AI auto-editing and filler word removal. Multi-platform (Mac, Windows, iOS) with 100+ language support — but no dedicated prompt generation workflow or review panel.

$12/mo subscription · No prompt panel · No local-first architecture
↳ Key Finding

No existing tool offers all three capabilities (dictation + translation + AI prompts) in one push-to-talk interface with local-first processing and no subscription model. The market gap was clear.

I Identified Three Core User Segments

The product targets users who live in AI-heavy workflows and lose significant time to tool switching. Each persona represents a distinct use case, but they all share one pain: too many micro-tools per thought.

Founders, Operators, Managers

People who frequently respond by voice, formulate ideas on the go, and work simultaneously across email, chats, docs, and browser. They value speed over interface perfection.

High Volume Multi-App Speed-First

Designers, Creators, Prompt-Heavy Users

People who work extensively with Midjourney, ChatGPT, Claude, and similar systems — frequently crafting complex multi-step prompts and wanting to turn voice thoughts into quality text.

AI-Native Creative Prompt-Driven

Multilingual Professionals

People who think in one language but write in another — frequently switching between Russian, English, and other languages. They want to eliminate the "dictate separately, translate separately, paste separately" flow.

Bilingual+ Translation Cross-Language

Four UX Goals That Shaped Every Decision

Minimize Decision Overhead

Users shouldn't decide "do I need a translator? an AI tool? a dictation app?" — they learn one input pattern, and the system routes scenarios automatically.

Preserve Flow State

The product must never pull users into a separate main window. That's why the architecture is a menu bar app with minimal floating surfaces.

Make AI Quiet, Not Pushy

AI in the product is a background capability, not a main screen. The interface is near-invisible in idle, shows compact overlays during capture, and only opens a panel for high-value outputs.

Make Reliability Part of Perception

Users must understand what the system is doing: microphone access, API readiness, model loading status, listening vs processing state. Trust is built through transparency, not magic.

Three UX Principles I Built the Product Around

Principle 01

Local-First for Speed and Flow

The local ASR model isn't just for privacy — it ensures the baseline scenario is fast, predictable, network-independent, and suitable for real work tempo. Offline capability is a UX feature, not a technical constraint.

Principle 02

Cloud Only When the Task Needs It

The cloud layer activates only when local ASR isn't enough: for smart prompt processing, translation enrichment, and complex deliverable-oriented AI responses. Cloud is an enhancer, never a dependency.

Principle 03

Trust Is Part of the Interface

The product embeds mechanisms that improve not just accuracy but the feeling of reliability: fallback behavior, API diagnostics, model status, keyboard diagnostics, prompt history, and explicit onboarding.

I Mapped the End-to-End Flow

The user journey is deceptively simple on the surface — press a key, speak, get the result. But underneath, the system handles mode detection, audio capture, processing routing, and surface selection in milliseconds.

01

Press and Hold Hotkey

Global keyboard hook detects the key combination and determines the active mode: dictation, translation, or smart prompt.

02

Audio Capture Begins

Microphone activates, audio converts to Whisper-compatible format, and the current feedback surface shows the listening state.

03

User Speaks

VAD, segmentation, and real-time draft transcription run locally via on-device AI model on Apple Silicon.

04

Key Released — Processing Starts

Capture session closes, and the system routes to the appropriate pipeline based on mode: local-only for dictation, cloud-augmented for translation and prompts.

05

Result Delivered to the Right Surface

Dictation and translation inject text silently into the active app. Smart prompt results open in a dedicated review panel with regenerate, copy, and voice refinement options.

User Flow Map Mode Routing Logic Surface Selection Rules State Machine Spec

I Designed a Single Input Grammar for Three Outcomes

The strongest UX decision in TYPERR: one push-to-talk gesture with modifier keys that route to different modes. Hold the base key for dictation, add a translation modifier for translation, add a prompt modifier for AI smart prompt.

This eliminates decision overhead — users learn one muscle memory pattern, and the system handles routing internally. No menus, no mode selectors, no visible UI at all.

Hold

Hold Key → Dictation

Local ASR via on-device AI model — text inserts directly into the active field. No window switch, no clipboard dance.

Hold+T

Hold + Translation Mod → Translation

Speak in any language, get the result in the target language — inserted inline into the current app.

Hold+P

Hold + Prompt Mod → Smart Prompt

Voice task + optional selected text context → AI-generated result in a dedicated review panel. Supports regeneration and voice refinement.

This input grammar solves multiple problems at once: it lowers the learning curve, removes unnecessary UI, makes the product feel like muscle memory instead of a menu system, and allows future extension without changing the mental model.

Input Grammar Spec Keyboard Hook System Mode Routing Logic State Machine Design

I Architected a Local-First Pipeline with Cloud Augmentation

The core architecture separates local and cloud processing by intent — dictation uses a local AI model on-device for speed and offline reliability, while translation and smart prompt layer cloud LLM APIs only when the task demands it.

This isn't just a technical choice — it's a UX principle where the baseline must be fast, predictable, and network-independent. Cloud is an enhancer, never a dependency.

TYPERR pipeline debug view — showing transcription and AI processing stages
AppService Controller Audio Engine Manager Transcription Service Prompt Mode Service Keyboard Injector Shortcut Manager

I Designed Four Purpose-Specific Surfaces

Not all results deserve the same delivery surface. I separated the interface into four distinct layers, each optimized for its specific output type and user attention level.

Dictation and translation inject text silently — low-friction, inline delivery. Smart prompt results open in a side panel because they're longer, more valuable, and may need editing or regeneration.

Menu Bar Presence

Always-on ambient status: idle icon, loading state, error indicator, active listening feedback.

Compact Overlays

Minimal floating windows during listening and processing — they give feedback without stealing focus from the user's work.

Prompt Result Panel

Dedicated review surface for high-value AI outputs with result type display, conversation history, regenerate, copy, and hold-to-refine.

Settings Control Center

Not a secondary screen but a real operational panel: hotkey assignment, language selection, API management, diagnostics, model status, and debug entry.

Menu Bar Presence Overlay States Prompt Result Panel Settings Control Center
TYPERR prompt assistant panel — AI result with regenerate and copy controls

Eight UX Decisions That Shaped the Product

Every product is a stack of decisions. These are the eight choices that define TYPERR's interaction model and separate it from every other voice tool on the market.

Decision 01

Menu Bar Over Traditional App

This reduces the visual cost of the product to near-zero. It never competes with primary work windows for screen space.

Decision 02

Push-to-Talk Over Tap-to-Toggle

Faster, clearer, fewer state errors, and a better feeling of "on-demand" tooling. The gesture maps naturally to "hold to speak, release to send."

Decision 03

One Grammar for Three Modes

The strongest structural UX choice in the project. Complexity lives inside the system architecture, not on the interface surface.

Decision 04

Inline Output for Low-Friction Scenarios

Dictation and translation results inject directly into the active app — saving time and preserving flow state.

Decision 05

Dedicated Panel for High-Value Outputs

Smart Prompt results go to a review surface because they're longer, costlier to get wrong, and may need regeneration or voice refinement.

Decision 06

Selected Text as Silent Context

Users don't need to repeat context by voice — the system reads the current text selection and uses it as part of the AI task automatically.

Decision 07

Settings as Control Center

The settings window isn't a formality — it's an operational panel for managing quality, behavior, diagnostics, and API health.

Decision 08

Diagnostics as User Confidence

If a voice/AI product can't be diagnosed by the user, it quickly feels unreliable. In TYPERR, this is treated as a first-class UX principle.

I Evolved the Product Through Working Software

TYPERR wasn't designed in a vacuum — I built every version myself, testing interaction quality against real daily usage. Each iteration tightened the feedback loop between design intent and actual behavior.

TYPERR v1 — initial settings and configuration interface
v1 — Basic voice utility with push-to-talk dictation and menu bar presence
TYPERR v2 — refined interface with improved layout and prompt panel
v2 — Workflow system with translation mode, smart prompt panel, and conversation history
3 Major Iterations Working Public Version Custom App Icon Test Suite

Five-Phase UX Process

Phase 01
Framing
Reframed from "dictation app" to "voice workflow layer"
Phase 02
Mental Model
Unified input grammar, outcome modes, output surfaces
Phase 03
Feedback Surfaces
Ambient, operational, and diagnostic UX layers
Phase 04
Trust Loop
API diagnostics, model status, fallback, onboarding
Phase 05
Ship via Iteration
Practical builds, daily testing, real usage feedback

Phase 1: Product Framing

The initial task was "let users speak instead of typing." I reframed it into a desktop voice workflow system — shifting focus from ASR to scenario orchestration, reduced app switching, and state design.

Phase 2: Simplifying the Mental Model

Instead of scattered features, I chose one core gesture, multiple outcome modes, and different output surfaces. Complexity stays inside the architecture, not the interface.

Phase 3: Designing Feedback Surfaces

I designed how the system communicates with the user: compact overlays during listening, processing indicators during heavy work, settings as control panel, and prompt panel as review surface.

Phase 4: Building the Trust Loop

To prevent the product from feeling like "magic that sometimes breaks," I embedded API status checks, model loading indicators, hotkey diagnostics, explicit error messages, and onboarding guards.

Phase 5: Iterating Through Working Software

The product evolved through practical iterations: base dictation, then translation logic, then SuperPrompt improvements, then reliability fixes, then public version polish and custom app branding.

What I Shipped

3→1
Tools Consolidated
Dictation + Translation + AI Prompt
1
Unified Input Gesture
Push-to-talk with modifier routing
4
Interface Surfaces
Menu bar, overlay, panel, settings
3
Major Iterations
Voice utility → workflow system → production
6
Core Services Built
Shortcut, Audio, Transcription, Prompt, Text, Injector
0
Windows to Switch
Results inject into the active app

TYPERR shipped as a working macOS utility with local ASR, cloud-augmented AI, and a desktop interaction model that doesn't exist in any competing product.

How I Measure UX Quality

For a voice + AI product, traditional conversion metrics don't capture the full picture. I defined six UX-specific metrics that map directly to the interaction model and trust system.

Time to First Output

First Successful Result

How long from first launch to first successful dictation or prompt result — measures onboarding friction and setup completion quality.

First-Pass Accept Rate

Result Accepted Without Edit

How often the result is accepted without manual correction or regeneration — the core signal of output quality and trust.

False Commit Rate

Noise/Fragment Committed

How often the system commits a result where the user didn't intend to — measures the reliability of the capture session and VAD logic.

Panel Reuse Rate

Regenerate + Refine Usage

How often users engage with the prompt panel's regenerate, copy, and hold-to-refine features — signals whether the review surface adds value.

Translation Speed

Speak → Translated Text

Total time from voice input to translated text appearing in the active app — the key metric for multilingual workflow efficiency.

Setup Completion %

API Setup Without Drop-off

Percentage of users who complete the full API setup without abandoning — measures onboarding UX quality and smart default effectiveness.

What This Project Taught Me

Lesson 01

The Best AI UX Is Invisible

TYPERR's power comes from what it doesn't show — no chat interface, no mode selector, no visible AI. The strongest UX decision was removing UI, not adding it.

Lesson 02

Trust Requires Transparency, Not Magic

AI products that hide their internals feel unreliable after the first failure. Building diagnostics as a product feature is the difference between a tool people tolerate and one they depend on.

Lesson 03

Designer + Engineer = Tighter Feedback Loops

By implementing the product myself, I could test interaction quality against real muscle memory within hours. The gap between design intent and shipped behavior collapsed to zero.

AI Is the Core Product — Not an Add-On

Unlike my earlier projects where AI was retrospective, TYPERR was designed AI-first from day one. Every architectural decision was shaped by AI capabilities and constraints.

Tool 1 · Local AI Model (On-Device)

Local Speech-to-Text

Task: Real-time transcription with VAD, segmentation, second-pass correction, and draft update path — all running locally on Apple Silicon.

Outcome: Sub-second feedback, offline capability, and privacy by default. The baseline experience works without any network connection.

Tool 2 · External LLM API (Cloud)

Smart Prompt Processing

Task: Intent classification, output kind detection, quality tier routing, structured content parsing, and AI-powered text transformation.

Outcome: Voice input becomes a classified deliverable — an email reply, a prompt, a code snippet, an instruction — not just raw text.

Tool 3 · External LLM API (Cloud)

Translation Pipeline

Task: Source language detection + target language translation, integrated into the same push-to-talk flow with inline result injection.

Outcome: Multilingual users speak in their native language and get translated text in the target app — one gesture, zero window switches.

Tool 4 · Claude + Cursor

Development Acceleration

Task: Architecture planning, native macOS implementation patterns, state machine design, API integration debugging.

Outcome: A solo designer-engineer shipped a production macOS utility with 6 core services, 4 interface surfaces, and a complete test suite.

Building an AI Product?

I design and build AI-native tools — from voice interfaces to desktop utilities to automated workflows. If you need someone who can think through the UX and ship the code, let's talk.

Get in Touch