Methodology

How metcha actually works.

A clear, source-cited explanation of the audio pipeline that powers a metcha conversation. We name the providers, the models, and the latency budget — and we say where we make trade-offs, because no translator is perfect.

1. Capture

Audio comes in through the iPhone's microphone (or the microphone on the shared earbud, when one is available). metcha uses Apple's voice-activity detection to mark the start and end of each utterance, so you don't have to tap a button to talk. Each utterance is buffered in memory, never written to disk by default.

2. Speech-to-text (STT)

Free tier — Apple's on-device Speech framework. Recognition runs locally; no audio leaves the device. Coverage depends on the device's downloaded language packs. Accuracy is strong for clear speech and the major regional accents; it weakens in noisy environments.

metcha Plus — optional cloud STT via Deepgram's Nova family. Better word-error-rate on accented speech, noisy rooms, and overlapping voices. Audio is sent over a TLS WebSocket, transcribed in near real time, and never retained by Deepgram for training (per our enterprise contract).

3. Translation

Free tier — Apple's on-device Translation framework. Language packs download once; subsequent translations run locally with no network call. The phrasing is competent but literal; it occasionally misses idioms.

metcha Plus — optional "Better Translation" path through an LLM (currently Anthropic's Claude family). The prompt frames each utterance in conversational context, so the translation reads like something a person would actually say, not a phrasebook. When the network is unhealthy, metcha falls back to the Apple on-device path silently — the conversation never stalls waiting on a cloud round-trip.

4. Text-to-speech (TTS)

Free tier — iOS built-in voices via AVSpeechSynthesizer. Quality depends on whether you've installed the "Enhanced" or "Premium" variant for the target language. metcha respects whatever you have.

metcha Plus — premium voices from ElevenLabs's multilingual catalog. Each supported language has a hand-picked short list of native-speaker voices. Optionally, you can clone your own voice once (with explicit, in-app consent) and have your translated lines spoken in something close to it.

Latency budget

A typical turn — speaker finishes, listener hears the translation — runs:

Capture (VAD endpointing): ~250 ms
STT (on-device): ~300–600 ms; STT (Deepgram streaming): ~150–300 ms
Translation (on-device): ~50–150 ms; translation (Claude): ~600–1100 ms
TTS (Apple, local): ~200–400 ms; TTS (ElevenLabs streaming): ~400–900 ms

End-to-end: roughly 1.0–1.5 s on the free path and 1.5–2.5 s on metcha Plus with all cloud stages active. Both are inside the threshold where a conversation still feels like a conversation; both are noticeably longer than a face-to-face exchange in a single language. We're transparent about it because we'd rather you set expectations correctly than be surprised.

What stays on device

The free tier is end-to-end on-device. No audio, transcript, or translation leaves the phone.
metcha Plus features that pass audio or text to a cloud provider do so under TLS to the provider only. We don't proxy or log content on our own servers.
Voice cloning, when opted in, stores a single voice fingerprint with ElevenLabs under our account. You can delete it at any time from Settings → metcha Plus → Voice cloning → Delete my voice.
Per-session transcripts are kept on your phone, encrypted at rest by iOS. You can export or delete them whenever.

What we don't do

We don't train models on your conversations. None of our providers retain audio for model training under our contracts.
We don't operate a cross-device account system. There's no metcha login, no metcha social graph.
We don't claim parity with a human interpreter. metcha is an excellent tool for conversational, daily-life translation. For medical, legal, or court-of-record settings, hire a certified human interpreter.

Where to read more

How metcha works — the user-facing walk-through.
Privacy policy — the full data-handling story, provider by provider.
Supported languages — the per-language matrix of STT, translation, and voice coverage.

Corrections

If anything on this page is wrong or out of date, email hello@metcha.io with the correction. We'll update the page and date the change.