[Beta] Text-to-Speech API: From text to voice content instantly

Hi Creators,

Our vision is that your experiences can harness the power of natural language to speak, respond and connect with users using the highest safety standards. We plan to release a series of natural-language APIs to help creators innovate, and reach new levels of immersion and storytelling.

Today, we’re excited to launch the Text-to-Speech API in Beta — enabling you to create professional narration and character dialogue instantly, without managing complex audio production workflows.

The Text-to-Speech API converts text to audio using up to 10 preset voices in English, with customization options like pitch and speed. Whether you’re building step-by-step tutorial guidance or game announcements for your players, this API makes it easy for any creator to add voice content.

Unlocking voice content creation

With our Text-to-Speech API, you can:

  • Enable dynamic storytelling — Generate different spoken dialogue based on player experience (new vs. experienced), previous choices or current game state

  • Launch voice-enabled experiences in days, not months — Produce voice content instantly and iterate on your dialogue

  • Create high-quality narration without traditional production costs — Professional-sounding dialogue without the added cost

In January, our team conducted an internal study with 200 creators on Roblox, which revealed that dialogue, tutorials and game announcers were some of the top three emerging use cases across a variety of gaming genres.

We can’t wait to see how you use the Text-to-Speech API to start building more engaging, voice-enabled experiences.

How to use the Text-to-Speech API

The API provides two versions for different use cases:

AudioTextToSpeech API — For real-time audio generation:

  1. Access the AudioTextToSpeech documentation in Creator Hub here
  2. Integrate the API into your script and input your text and customize voice parameters (pitch, speed, voice selection)
  3. Use existing audio effects to further customize the generated speech
  4. Audio plays immediately without saving as an asset

GenerateSpeechAsset API — For saved audio assets:

  1. Access the GenerateSpeechAsset documentation in Creator Hub here
  2. Generate speech that saves directly to your audio asset inventory
  3. Reuse generated audio in any experience you own
  4. Perfect for recurring dialogue or narration segments

For more information, visit our Text-to-Speech API guide.

We thought it would be helpful to create an example to show how everything could work together. The demo below shows a simple experience where you can see how to call the Text-to-Speech API.

This demo shows the snowman speaking and providing hints for what the player should do next. Note: The customized voice parameters for pitch and voice selections can also be heard.

Built-in Safety

We facilitate text-to-speech language capabilities with safety at the forefront. All text input is passed through text filters and generated audio is also proactively moderated by Roblox’s AI safety systems to ensure the content complies with our Community Standards. Our safety tools can surface any policy violations quickly and help determine what is safe and appropriate to publish in an experience.

What’s next

This Text-to-Speech Beta launch is just the beginning. We’re working on:

  • Additional languages beyond English
  • More voice selection options
  • Player inventory integration to allow users to safely save assets generated by text-to-speech in their own inventory

We’re also expanding our natural language API suite — Speech-to-Text API launches in closed beta next quarter for voice commands, and our Text Generation API recently opened to all Moderate or Restricted content maturity experiences for dynamic NPC dialogue.

We’d love to hear about your use cases and see what you create! Please share your feedback and experiences in the comments below.


FAQs

Is there any cost, request rate or asset quota limitations?

  • The API baseline utilization will be free at the beta launch. However, we have rate limits in place to optimize for game dialogue that also prevents system abuse as follows:

    • Character Limits: A maximum of 300 characters per request is set for both versions of the API

    • Request Rate Limits: We have enabled dynamic scaling based on your experience’s concurrent users using this formula: maximum requests per second = 1 + (0.05 × per thousand concurrent users)

    • Asset Generation Limits: Asset generation requests count toward your existing audio upload quota. This only applies to the GenerateSpeechAsset API

    These limits help ensure system stability and fair usage across all users. Please note that longer content can be split across multiple requests. In the future, rate limits may be further adjusted based on system performance and user feedback. Additionally, we plan to integrate Text-to-Speech with our extended services system so that you can purchase additional service usage if needed.

What happens if my text input violates Community Standards?

  • The API will return an error and no audio will be generated. All text is filtered before processing to ensure compliance with Roblox Community Standards. In rare instances where violative content may be generated, the developer responsibility varies by API type as follows:

    • AudioTextToSpeech API (real-time): You will not be held responsible for potentially violative audio outputs unless you intentionally program the system to generate violating content.

    • GenerateSpeechAsset API (saved assets): Since generated audio assets are saved to your inventory, any moderation actions will be taken against your account as the asset owner. We strongly recommend against allowing players to directly input text for asset generation to avoid potential moderation issues.

What data is used for voice training?

  • The Text-to-Speech API is trained on publicly available, open-source datasets to ensure high-quality synthetic voice generation and responsible data practices are followed to align with our safety standards. These voices are a combination of many synthetic voices and don’t represent a real human’s voice.

Will this work in all experience content ratings?

  • Yes, the Text-to-Speech API is available for experiences of all content maturity levels, with appropriate safety measures, such as text filtering and audio moderation, in place for each audience.

Where are audio assets generated in experience stored?

  • We currently store generated audio assets in the experience owner’s inventory. This reduces player friction as a user consent flow isn’t required. However, we recognize that this draws from the experience owner’s monthly audio asset quota and will be working on a safe solution that saves the asset to a player’s inventory. Note that this only applies to the GenerateSpeechAsset API, not the AudioTextToSpeech API, which does not create an audio asset.

How many predefined voices are available for use?

  • We currently have 10 voices that you can use for this API. They each map to an numerical ID that you can input in the VoiceId field (note that this field accepts string values, so inputs should be formatted as “1”.
Voice ID Description
1 British male
2 British female
3 US male #1
4 US female #1
5 US male #2
6 US female #2
7 Australian male
8 Australian female
9 Retro voice #1
10 Retro voice #2
271 Likes

This topic was automatically opened after 10 minutes.

Nice, this is a cool update.

Excited to see what everyone will make with this.

20 Likes

Love this update! I love how useful this will be especially for younger players who may have trouble keeping up if the game has a lot of text to read. When are we getting speech to text/speech recognition?

12 Likes

Just tested this, sounds like a real human. thank you for this roblox

16 Likes

Thousand CCU? I feel like this should scale bettter for smaller games, this is a useful tool but it looks like the limit isn’t going to be budging unless your game is massive.

45 Likes

Thank you for this!

Ah, this is a relief. I was aiming to let players generate their own voicelines and was uncertain of how you were going to moderate the text and the players/experience creator in case a player violates the TOS, and not the creator. This reassures me that Roblox handles the moderation on these.


Can this already be used live (client-beta)? Or only in Roblox Studio?

9 Likes

Ooh I’ve been waiting for this for a long time now!
This is pretty much perfect for the game I’m working on, can’t wait to add it.

Thank you so much, it’s so nice!

6 Likes

Very good annoucement because I can use this this dialogue.

5 Likes

three words: i love it.

i’m already thinking of some crazy stuff with the new TTS instance! also, judging how there are voice IDs, i’m guessing somewhere in the future we’ll be able to upload our own TTS voices?

(your right ear will love this video)

16 Likes

Wait using this API creates an Audio Asset??? In the Creator’s Inventory???

8 Likes

OMG LETS GOO THANKS ROBLOX W!

6 Likes

This is an amazing update. It will be immensely useful in the current game I’m working on because it will replace our old, more tedious workflow of generating speech off-site and importing them to the platform manually.

Are there any plans to add more voices to the mix? Ten voices are quite plentiful for smaller games, but it could be somewhat limiting for more ambitious story-based games.

4 Likes

this would make my ai npc more unpredictable lol

7 Likes

Works in live-servers


Can’t seem to get it working in unpublished studio places though… It’s only working in published studio files & in live servers… Anywhere else just errors “Invalid user” :confused:
[using the example script]

6 Likes

Will we get a text to speech for Esperanto?

4 Likes

Forget about every single bad thing I have said about this platform. This is peak, this is life, this is care, this is every good thing. I have been waiting for this since the first time I saw the AudioTextToSpeech with Instance.new() (about a month). This is so peak, we are so back. Omg I love the team behind this, I don’t know what to say, but OMG I LOVE THIS. This is like the excitement for the datastore manager all over again!

8 Likes

I can’t wait to add real time generated commentary radio to my game this is so amazing

fun fact: If you add dots, the tts will yield

4 Likes

this will be very useful thank you for your amazing work!!

2 Likes

Both do not sound like US…
characters

1 Like