Some of our queries are failing / timing out on Roblox’s Custom Events. This is widespread across many of our events, and it seems to be random based on the date range, aggregation, and filtering.
One query measured at over 1 minute and 30 seconds before just timing out, which I presume is the hard cap on how long it can take.
Expected behavior
Queries should always succeed, or at least provide meaningful feedback on how to prevent timeouts in the future.
Hi:
Thanks for reporting this. We are working on some optimization.
I noticed that you are breaking down by customField1 in your query and for this even, you have 600+ different values on custom field 1, which I think is the primary reason the query fails as it takes a long time to compute for each serie.
Do you mind providing more information why it has these many fields and if you think it can be optimized by either creating new events for specific use cases or remove unnecessary field value for better performance?
We are using CustomField01 (sometimes CustomField02) for A/B test group comparison on numerous events. This is a low cardinality set of keys with only a few ‘active’ keys at any given time (test groups are often used once and then discarded after usage to avoid reusing keys in long-term look backs).
We’ve already have comprehensively broken up the events by specific use cases with over 50+ types, so I don’t think there is much more optimization that can be done on our side. We are following the service-level rate limits and respecting individual limits on certain features like custom fields.
Understood.
Regarding your explanation for Active keys, from what I see in the event Animation Engagement: View CustomField1 has 600+ active keys for past 1 day and 5700+ unique combinations for customField1 + customField2.
This is the part that confuses me, if as you said this is for A/B testing and get discarded after the test round is done, I wonder why it still has these many values?
We have seen the query performance getting much worse when cardinality is too high (usually > 100) and it will also be a gigantic payload to return and render. That’s why I want to see if there is opporunity to optimize it.
Oh for that specific event I can give some more details:
CustomField01 would be unique identifiers for all of our animations we have in Clip It!, so the active keys makes sense to be consistently higher in the hundreds. There is not very much capacity for optimization here as Custom Events are player unique per-event, rather than aggregate by server. We get tens of millions of views a day across a very high cardinality set of clips, and tracking this animation data is important to understand demographic performance.
CustomField02 would be the A/B test group, normally this is in CustomField01, but it seems I was mistaken on this particular event.
If you want more details I’m happy to DM this as it may require discussing business-specific data that I’d prefer to not share on an open forum.
Got it. Thanks for the details.
We are working on some optimization.
Meanwhile we also notice our flow limits the data to return only up to 10K unique value. So if you query last 28 days with 600 value in breakdown, it won’t be able to return you all the results. So I would say trying to limit the result within 14 days at a time.
Still you may encounter some failures, we are actively working on several different optimizations to improve the performance.
Hi! We have been tuning the cluster on our side and noticing that that the chart does load successfully. Unfortunately heavier queries do take longer, and I would suggest trying shorter time ranges as friedrich1717 mentioned.
Closing this ticket for now. Feel free to re-open if you are still running into issues with Custom Events.