Datastore errors are not properly reported to the dashboard

Hello there! As requested @Hooksmith

The “Request Count by Status” and “Request Count by API x Status” in the “Monitoring → Datatstore” category does not properly report all datastore errors.

The graph by status reports only 3 “non-ok” request and the graph by API x Status reports 19 while the error report shows a little over 1300 over the same 4 weeks period

Request Count by Status:

Request Count by API x Status:

Error Report:

Page URL: https://create.roblox.com/dashboard/creations/experiences/508099649/analytics/explore?metric=DataStoreRequestsByStatus&breakdown=DataStoreStatus&granularity=ThirtyMinutely&filter_DataStoreTypeV2=Standard

I’ll ask the datastore team to check into this topic, thanks for posting it.

2 Likes

Hi @Sylmat, thank you for bringing this up.

The error report should show the exact count of your errors.

The request count charts apply an average of “Time Interval”, in your case average over Half Hour, which lessens the exact counts of errors.

I will bring this up to our product team to see if we want to change the summary to a sum function instead.

2 Likes

Hello, that would be incredible thank you!

In my opinion it doesn’t have to be one or the other tho, there could be a dropdown that lets you select which kind you want, either averaged or summed, so people have the choice for their preferred way (I’m sure some are likely used to average and would prefer to keep it)

We added the ability to zoom in and view the metric on a minutely level, this way you can see the exact count of errors on the charts. This is releasing later today.

We will also think about aggregation rules a bit more.

2 Likes

Hey there!

Great quick release! I’ll definitely try to check for hourly, but this really doesn’t give much margin to see if it works properly or not. I’ll try to update this post if I happen to stumble in an hour range that contained internal errors.

But I still can’t consider this fixed when the issue still happens:

Above with the X is the reported number, under is the actual number in the datastore dashboard:

You can see that despite being average, the average doesn’t make any sense. It successfully reported 2 internal errors at 5PM, but, somehow, 11 at 5AM did not go through, not a single one.

I 100% believe something is still broken, either with average or something else.

I agree with you, there is some estimation done to calculate the average. I will bring it to the team. In the meantime, please drill down to minutely granularity to see the exact counts.

Hello,

I was just able to catch errors in the hour, looks like one is missing,

Error report, one at 01:05, one at 12:40:

DataStore dashboard, 12:40 is there, 01:05 is missing:

Hi, I found more occurrences where it doesn’t display all errors in the dashboard:


06:36 shows no errors
06:37 shows 6/7 (from those that were supposed to be at 06:36)
06:48 shows 1/1
06:52 shows 1/1
06:53 shows 0/1
06:54 shows 1/1

Thanks @Sylmat , we do not expect to have 100% matching as these two data are generated by 2 different systems.

Hello,

Sorry but shouldn’t they show the same data? It seems kinda weird that some errors in the error dashboard are not reported in the datastore dashboard? If anything I would expect it the other way around, datastore dashboard should be more precise, more accurate or at the very least the same as the error reports when it comes to datastore errors?

From my perspective this is deceiving for any creator looking at the datastore dashboard and would be even more deceiving for Roblox engineers tracking the error rates from datastores, since it would show less error than what is actually happening?

I understand that they’re 2 different systems, but I just don’t understand how the difference can happen?

Not sure if you could clarify on this? Thanks

I fully understand your perspective, we are analyzing this internally and will circle back post RDC.

1 Like