Hi Creators,
Our vision is that your experiences can harness the power of natural language to speak, respond and connect with users using the highest safety standards. We plan to release a series of natural-language APIs to help creators innovate, and reach new levels of immersion and storytelling.
Today, we’re excited to launch the Text-to-Speech API in Beta — enabling you to create professional narration and character dialogue instantly, without managing complex audio production workflows.
The Text-to-Speech API converts text to audio using up to 10 preset voices in English, with customization options like pitch and speed. Whether you’re building step-by-step tutorial guidance or game announcements for your players, this API makes it easy for any creator to add voice content.
Unlocking voice content creation
With our Text-to-Speech API, you can:
-
Enable dynamic storytelling — Generate different spoken dialogue based on player experience (new vs. experienced), previous choices or current game state
-
Launch voice-enabled experiences in days, not months — Produce voice content instantly and iterate on your dialogue
-
Create high-quality narration without traditional production costs — Professional-sounding dialogue without the added cost
In January, our team conducted an internal study with 200 creators on Roblox, which revealed that dialogue, tutorials and game announcers were some of the top three emerging use cases across a variety of gaming genres.
We can’t wait to see how you use the Text-to-Speech API to start building more engaging, voice-enabled experiences.
How to use the Text-to-Speech API
The API provides two versions for different use cases:
AudioTextToSpeech API — For real-time audio generation:
- Access the AudioTextToSpeech documentation in Creator Hub here
- Integrate the API into your script and input your text and customize voice parameters (pitch, speed, voice selection)
- Use existing audio effects to further customize the generated speech
- Audio plays immediately without saving as an asset
GenerateSpeechAsset API — For saved audio assets:
- Access the GenerateSpeechAsset documentation in Creator Hub here
- Generate speech that saves directly to your audio asset inventory
- Reuse generated audio in any experience you own
- Perfect for recurring dialogue or narration segments
For more information, visit our Text-to-Speech API guide.
We thought it would be helpful to create an example to show how everything could work together. The demo below shows a simple experience where you can see how to call the Text-to-Speech API.
This demo shows the snowman speaking and providing hints for what the player should do next. Note: The customized voice parameters for pitch and voice selections can also be heard.
Built-in Safety
We facilitate text-to-speech language capabilities with safety at the forefront. All text input is passed through text filters and generated audio is also proactively moderated by Roblox’s AI safety systems to ensure the content complies with our Community Standards. Our safety tools can surface any policy violations quickly and help determine what is safe and appropriate to publish in an experience.
What’s next
This Text-to-Speech Beta launch is just the beginning. We’re working on:
- Additional languages beyond English
- More voice selection options
- Player inventory integration to allow users to safely save assets generated by text-to-speech in their own inventory
We’re also expanding our natural language API suite — Speech-to-Text API launches in closed beta next quarter for voice commands, and our Text Generation API recently opened to all Moderate or Restricted content maturity experiences for dynamic NPC dialogue.
We’d love to hear about your use cases and see what you create! Please share your feedback and experiences in the comments below.
FAQs
Is there any cost, request rate or asset quota limitations?
-
The API baseline utilization will be free at the beta launch. However, we have rate limits in place to optimize for game dialogue that also prevents system abuse as follows:
-
Character Limits: A maximum of 300 characters per request is set for both versions of the API
-
Request Rate Limits: We have enabled dynamic scaling based on your experience’s concurrent users using this formula: maximum requests per second = 1 + (0.05 × per thousand concurrent users)
-
Asset Generation Limits: Asset generation requests count toward your existing audio upload quota. This only applies to the GenerateSpeechAsset API
These limits help ensure system stability and fair usage across all users. Please note that longer content can be split across multiple requests. In the future, rate limits may be further adjusted based on system performance and user feedback. Additionally, we plan to integrate Text-to-Speech with our extended services system so that you can purchase additional service usage if needed.
-
What happens if my text input violates Community Standards?
-
The API will return an error and no audio will be generated. All text is filtered before processing to ensure compliance with Roblox Community Standards. In rare instances where violative content may be generated, the developer responsibility varies by API type as follows:
-
AudioTextToSpeech API (real-time): You will not be held responsible for potentially violative audio outputs unless you intentionally program the system to generate violating content.
-
GenerateSpeechAsset API (saved assets): Since generated audio assets are saved to your inventory, any moderation actions will be taken against your account as the asset owner. We strongly recommend against allowing players to directly input text for asset generation to avoid potential moderation issues.
-
What data is used for voice training?
- The Text-to-Speech API is trained on publicly available, open-source datasets to ensure high-quality synthetic voice generation and responsible data practices are followed to align with our safety standards. These voices are a combination of many synthetic voices and don’t represent a real human’s voice.
Will this work in all experience content ratings?
- Yes, the Text-to-Speech API is available for experiences of all content maturity levels, with appropriate safety measures, such as text filtering and audio moderation, in place for each audience.
Where are audio assets generated in experience stored?
- We currently store generated audio assets in the experience owner’s inventory. This reduces player friction as a user consent flow isn’t required. However, we recognize that this draws from the experience owner’s monthly audio asset quota and will be working on a safe solution that saves the asset to a player’s inventory. Note that this only applies to the GenerateSpeechAsset API, not the AudioTextToSpeech API, which does not create an audio asset.
How many predefined voices are available for use?
- We currently have 10 voices that you can use for this API. They each map to an numerical ID that you can input in the VoiceId field (note that this field accepts
string
values, so inputs should be formatted as“1”
.
Voice ID | Description |
---|---|
1 | British male |
2 | British female |
3 | US male #1 |
4 | US female #1 |
5 | US male #2 |
6 | US female #2 |
7 | Australian male |
8 | Australian female |
9 | Retro voice #1 |
10 | Retro voice #2 |