Hey everyone. I wanted to close the loop here and give an explanation on what happened.
At 7:30PT MessagingService saw a massive increase in publishes. This was in multiple orders of magnitude. The system that handles influx struggled under the load then began failing to limit globally causing the outage in both publishes and subscribes.
We know this service is critical to many of your games and there is already planned work to catch and keep these types of issues from causing widespread outages. We have recently been putting a lot more effort into improving MessagingService and I hope to share some awesome news with you all soon.
We have an awesome group of creators using MessagingService in this thread. Please feel free to comment on the things you run into directly with the system in this thread or message me directly. I am also interested in use cases that MessagingService could solve for you in the future if certain changes are made.
11 Likes
Thanks a lot! This insight is greatly appreciated and it’s good to hear that this is actively being worked on. Excited to see what you guys have planned for MessagingService in the future. My only suggestion right now would be to increase the publish limits as well as the max data size (currently 1kB).
6 Likes
Sorry for the second reply - there’s two more things I wanted to mention here.
1. Could we have some clarity on if MessagingService tries to resubscribe to topics it disconnects from (usually due to an outage)? I’ve seen errors along the lines of “Resubscribe failed” which leads me to think it does try to resubscribe, however it seems it gives up after some time. I’d suggest it tries to resubscribe periodically (maybe every 1-2 minutes or so), because with however it currently works, we’re forced to reboot certain servers after MessagingService outages because they’ve disconnected crucial subscriptions.
2. Not sure if this is your field, but there’s currently no documentation regarding MessagingService rate limits with Open Cloud APIs. I asked a while back here and an engineer said they’d get back to me but never did. Even if you could state them here (if you know them) that would be great.
5 Likes
Thanks for the update!
I would love to see a section in developer stats similar to MemoryStores but for MessagingService.
7 Likes
Thanks a lot for the assistance! I have two more questions I had earlier but only recalled now (last ones, I promise!).
Are rate limits per API key or per experience? I’ve observed that for Open Cloud DataStores, it’s per API key. I’m curious if this holds true for the MessagingService, however it scales based on CCU so I understand if not.
Would it be possible for the Open Cloud MessagingService to return errors if a message fails to deliver (due to downtime or other reasons)? My game features a global events system where players buy experience-wide events. These events then communicate with my external API, which subsequently requests back to MessagingService to broadcast the event across all servers post-validation. The issue is the API signifies a status code of 200 even when the message doesn’t get delivered. This hinders the automated refund process for players during service interruptions and causes in-game currency loss.
1 Like
Limits are per universe for MessagingService
I will look into your second point to see if we can make errors more visible
1 Like
Can we get a function that returns how much budget we have for different api calls? similar to datastore budget
2 Likes
I think :SubscribeAsync() is having problems. In the past few days, it would randomly hang indefinitely in some servers. I have a print statement right before subscribing (which gets printed) but nothing after the :SubscribeAsync() call shows up
2 Likes