Kokoro 82M

Kokoro TTS Online

Stream low-latency speech from the open-source Kokoro text-to-speech model across American and British English, Japanese, Mandarin Chinese, Spanish, French, Hindi, Italian, and Brazilian Portuguese voices.

Create lifelike narration for videos, audiobooks, and podcasts with natural pacing and studio-grade clarity.
0/500 characters
Language
Select language
Upgrade for 3x faster generation, commercial use, and no daily limits.

Languages Kokoro Covers

Kokoro ships curated voices for American and British English alongside Japanese, Mandarin Chinese, Spanish, French, Hindi, Italian, and Brazilian Portuguese. Bundle short sentences for the best quality, especially under 200 tokens.

American EnglishBritish EnglishJapaneseMandarin ChineseSpanishFrenchHindiItalianBrazilian Portuguese

How to Generate Kokoro Speech

Kokoro performs best on batches of 100-200 tokens, so group shorter lines before rendering.

Step 1

Enter Your Script

Paste narration, dialogue, or localization strings. Combine very short sentences so the model has enough context.

Step 2

Select a Kokoro Voice

Pick a voice preset and preview the tone before generating your final audio.

Step 3

Generate & Download

Generate Kokoro audio in seconds, then download MP3 files ready for editing or immediate playback.

Where Kokoro Shines

Open-weight synthesis unlocks projects that need flexible deployment and clear licensing.

Global Product Localization

Global Product Localization

Deliver English, Japanese, Mandarin, Spanish, and French voiceovers without juggling multiple proprietary vendors.

Indie Games & Animation

Indie Games & Animation

Prototype character voices quickly, then ship with an Apache-licensed stack you can host in your own pipeline.

Creator Voiceovers

Creator Voiceovers

Publish shorts, explainers, and tutorials with consistent Kokoro narrators that feel expressive and natural.

FAQs

Kokoro 82M is an open-weight text-to-speech model built on StyleTTS 2 and ISTFTNet. It packs 82 million parameters and runs quickly on commodity hardware thanks to its lightweight architecture.
The curated presets shipped with Readio cover American and British English, Japanese, Mandarin Chinese, Spanish, French, Hindi, Italian, and Brazilian Portuguese.
Kokoro can match the warmth of much larger models when you keep inputs between roughly 100 and 200 tokens. Extremely short snippets may sound abrupt, so combine lines before generating.
Bundle sentences so each request falls in the 100-200 token range and avoid extremely short fragments. For longer narrations, break text into natural paragraphs so Kokoro can keep pacing consistent across chunks.
We process your input securely and do not store text permanently after synthesis completes. Your Kokoro generations stay private and are never shared with third parties.