Jumping the gun to make a simple OOP-based AudioTextToSpeech wrapper.
My original goal was to pre-load all of the generated text by splitting up the given message into an array of strings. This didn’t work because Roblox doesn’t open source their API. (Cloud_GenerateSpeechAsset)
I would have to use something like roproxy.com, or make my own proxy. In the past when I’ve tried to use open-source proxies, such as roproxy, my requests are almost always declined.
Some of the upsides would be less stress on the server and less waiting time for new messages.
But this is a new API. I’m sure they’ll find a way to make it entirely based on Roblox rather than using HttpService.
Instead, I went with a simpler approach by using the AudioTextToSpeech instance.
Given that there are already alternatives, I figured this approach has the most potential.
Player messages that have been received by the default RBXGeneral TextChannel will have their content narrated by the Character.
Sound (utility—optional)
export type EmitterMap = {
Destroy: (self: EmitterMap) -> (),
Wire: (self: EmitterMap, Target: Instance) -> (),
_Instance: AudioEmitter,
_Wire: Wire,
}
local emitter = {}
emitter.__index = emitter
function emitter.Destroy(self: EmitterMap)
self._Instance:Destroy()
end
function emitter.Wire(self: EmitterMap, Target: Instance)
self._Wire.SourceInstance = Target
end
function emitter.new(Parent: Instance): EmitterMap
local self = setmetatable({}, emitter)
self._Instance = Instance.new("AudioEmitter")
self._Wire = Instance.new("Wire")
self._Wire.TargetInstance = self._Instance
self._Wire.Parent = self._Instance
self._Instance.Parent = Parent
return self
end
return {
Emitter = emitter.new,
}
TextToSpeech
local Sound = require(game:GetService("ReplicatedStorage").Sound)
export type SpeechMap = {
_LoadTextAsync: (self: SpeechMap) -> boolean,
Play: (self: SpeechMap, message: string) -> (),
Stop: (self: SpeechMap) -> (),
Destroy: (self: SpeechMap) -> (),
_Instance: AudioTextToSpeech,
_emitter: Sound.EmitterMap,
_voice_id: number,
}
local module = {}
module.__index = module
function module._LoadTextAsync(self: SpeechMap, text: string): boolean
local success, result = pcall(self._Instance.LoadAsync, self._Instance)
return success and result == Enum.AssetFetchStatus.Success
end
function module.Play(self: SpeechMap, message: string)
self:Stop()
self._Instance.Text = message
if self:_LoadTextAsync() then
self._Instance:Play()
else
self._Instance.Text = ""
end
end
function module.Stop(self: SpeechMap)
self._Instance.TimePosition = 0
self._Instance:Unload()
self._Instance:Pause()
end
function module.Destroy(self: SpeechMap)
self._Instance:Destroy()
self._emitter:Destroy()
end
function module.new(Parent: Instance, voice_id: number, volume: number?, speed: number?, pitch: number?): SpeechMap
local self = setmetatable({}, module)
self._Instance = Instance.new("AudioTextToSpeech")
self._Instance.Volume = volume or 1
self._Instance.Speed = speed or 1
self._Instance.Pitch = pitch or 0
self._Instance.VoiceId = voice_id
self._Instance.Parent = Parent
self._emitter = Sound.Emitter(Parent)
self._emitter:Wire(self._Instance)
return self
end
return {
new = module.new,
}
Server
local TextChatService = game:GetService("TextChatService")
local TextToSpeech = require("@self/TextToSpeech")
local TextChannels = TextChatService:WaitForChild("TextChannels", 5) :: Folder
local RBXGeneral = TextChannels:WaitForChild("RBXGeneral", 5) :: TextChannel
local playerTTS = {} :: { [number]: TextToSpeech.SpeechMap }
local Players = game:GetService("Players")
local function CleanPlayer(Player: Player)
local speechMap = playerTTS[Player.UserId]
if speechMap then
speechMap:Destroy()
end
end
Players.PlayerRemoving:Connect(CleanPlayer)
Players.PlayerAdded:Connect(function(Player)
Player.CharacterAdded:Connect(function(Character)
CleanPlayer(Player)
playerTTS[Player.UserId] = TextToSpeech.new(Character.PrimaryPart, 1)
end)
end)
RBXGeneral.ShouldDeliverCallback = function(TextMessage: TextChatMessage)
local TextSource = TextMessage.TextSource
if TextSource then
local speechMap = playerTTS[TextSource.UserId]
print(`Set "{TextMessage.Text}" to AudioTextToSpeech.Text`)
speechMap:Play(TextMessage.Text)
end
end
Now it doesn’t look too fancy.
It’s meant to be robust, easy to read, and quick to change out as soon as Roblox updates their API.
Every time a character is added it will give them a new AudioTextToSpeech that’s wired to an AudioEmitter. This is stored inside of a global table on the server to be referenced every time they chat a message.
When the server gets the message, it’s going to tell the TextToSpeech module to Play the text. To follow along with the current state, I use LoadAsync to wait for the message to be loaded.
I used another class to make sound management easier to manage and clean.
Since the player can always immediately make another message before it’s loaded, I call stop to Unload, assuming that that stops the LoadAsync process. This will also stop any sound that’s playing by pausing (Pause) and then setting the TimePosition back to zero.
Again, very simple. This is brand new API that they just released the other day, and it needs more there to be fully effective.
Methods to generate text and sound through a single service would be very helpful and open the way to make this module how I intended to from a start. Something that can be cached and gotten easily. A library of sounds that you would not have to generate anymore because it’s already been generated. All of which could be cached or inserted into the game on the fly.
One thing I found confusing was how LoadAsync worked. The Text property should be read only. Then you use LoadAsync to set the property and generate the speech by passing a string.
Please add to the discussion!
How can we make real-time TTS a reality? Roblox would be the first to pioneer such a large project.
I can’t think of any other game that remotely offers this. It would be very helpful for players without a microphone, all while vastly improving immersion.