This is a thread with rendering updates for September! I feel like these are still valuable even though we started posting the global changelog - please let me know if you think otherwise. Thanks to @ConvexRumbler, @Homeomorph, @maxvee, @programeow, @Qiblox and @zeuxcg for working on these!
Instancing is sometimes live
We’ve been working on a new part rendering system (codename “instancing”) for a while and after a few attempts that some of you have noticed that caused very visible bugs, it’s reasonably ready to be enabled on some platforms.
The new system currently is only enabled for MeshParts and CSGs (not SpecialMeshes!) and is only working on Windows for NVidia/AMD GPUs; over the next few months we will try to release it on as many platforms/GPUs possible. We are also working on making other part types compatible with instancing but it will take a while.
TL;DR is that rendering memory of MeshPart/CSG objects now depends on the number of unique instances of these objects; that is, cloning these objects consumes minimal amount of extra memory; additionally, all property updates are as fast as they can be and don’t require expensive reclustering operations.
More words on the subject
I'll share a few details for the inquisitive minds.For historical reasons we call part rendering system behavior “clustering” - it groups objects into clusters based on proximity and whether parts are moving or not, and tries to maximize the performance of rendering each cluster. The existing clustering system was written in 2012 and is otherwise known as “featherweight parts”. We have used this system ever since; it survived our deprecation of fixed-function hardware (that doesn’t have shaders - think 15 years old GPU), our switch to the new rendering engine away from Ogre, our transition to DirectX 11 and Metal, etc.
Very briefly, the current clustering system works by assembling one large mesh for the entire cluster out of all parts that constitute the mesh. All part information - size, various appearance properties like reflectance, part color, part transformation - is baked into this mesh. Whenever any property of any part in the cluster changes, we have to rebuild this mesh. This code is pretty well optimized for what it does, but fundamentally it has to do an expensive operation for each update.
This is because the system tries to minimize the number of draw calls dispatched to the GPU, and assumes that updates are infrequent and that each individual part is simple. Remember, in 2012, we had no CSGs, no meshes (apart from character meshes - clustering system treats each character as a separate cluster) and simpler worlds in general. Draw calls were also REALLY expensive, particularly on mobile which we couldn’t have shipped without this system, because of many inefficiencies in Ogre rendering engine at the time, and a different set of graphics APIs/features that we had to support.
There’s a special exception for constantly moving parts - we extract their CFrame information into a separate buffer and update this buffer independently which is much faster than rebuilding the cluster. The exception exists because physics simulation is super common in Roblox, whereas massive modifications of other properties were not as common.
Today, we live in a different world - we have CSGs and MeshParts that shift the complexity balance - each object is much more complex than a block - we have a new rendering engine that makes it much simpler for us to experiment with alternative approaches to render objects - we have many cases where developers build complex vibrant worlds and the clustering cost due to updates starts to dominate execution time - and we have increasing mobile reach where memory consumption is crucial.
We’ve tried to optimize the existing system for memory consumption - see many of the previous rendering updates - but fundamentally we know that at some point in the future we need to switch to another one. So we did - special credit for @maxvee for persevering during the development ;). Instancing works by batching as many objects of the same archetype - same MeshPart mesh, same CSG render data, and eventually same shape of a basic part - into one draw call, and having GPU fetch part properties like size, cframe, color, from a separate buffer that we can update much faster. Additionally (importantly!), we no longer assemble a mesh per cluster which means that each MeshPart or CSG only has its vertices in memory once on the GPU, instead of having a baseline CPU copy and many GPU-side copies in various clusters.
We’ve also paid attention to making sure that updates of properties have a predictable cost - where previously one update to one part could cause as much workload as 100 updates to neighboring parts because we could only update a cluster as a unit, now we try to make sure update cost scales with the amount of updates. Because of this, the new system does not throttle updates - when the new system is active (remember, it will take a while for it to be available on all hardware or for all object types), all updates are immediate.
Part of the reason why the development of this new system took time is because it’s a complete rewrite of part renderer that has to preserve many odd features (we’ve discovered some glaring inconsistencies in the behavior of CSG parts that is unfortunately hard to fix without breaking the existing content - so we have to replicate this behavior, aka be bug-compatible with the old system); part of it though is that the reason why we can do these things is that we rely on GPU features that we haven’t used before - they became commonplace in DirectX 10 era, but we still encounter various issues with drivers on various platforms, so bear with us as we figure out how to make it work on as many devices as we can.
There are some known behavioral bugs that are getting fixed shortly:
- Neon MeshPart/CSG parts don’t render unless there’s a Neon non-mesh/csg part in the view
- Material UVs are not randomized creating tiling artifacts
If you discover any other bugs please report them as separate threads in the Bug Reports forum.
UI is now rendered at native resolution on mobile
For a few years we had a feature where on Retina-resolution devices on iOS we would render UI at “native” resolution, 3D at a reduced resolution to maintain performance, and then composit the two. I say native in quotes because we actually assumed that retina scaling factor is 2x; this was true for a while but with iPhone 6+, 7+ and X this is no longer true.
On Android, due to a set of technical issues and also due to this not being a high priority, we just rendered at a reduced resolution. However, with further development of some internal projects and with increased focus on mobile and on beautiful UIs, this became easier and more important so we did it.
Now we will always render UI at the native resolution on mobile, both on iOS and Android, including 3x on larger iPhone devices and 4x or whatever it is the 4K Android phones have these days. We will also rasterize fonts at this native resolution and many of our built-in UIs will have images for 1x/2x/3x. As a result, our UIs and your UIs will look much better on mobile devices.
3D is still rendered at non-native resolution; 3D UI will render at non-native resolution if it’s not AlwaysOnTop, and at native if it is.
Studio now optimizes meshes on import
We now apply additional processing when importing mesh parts in Studio, that preserves the appearance but optimizes the memory used by meshes, the speed at which GPU renders them and the size of the data client has to download to use the mesh.
We’re still applying the memory fix mentioned a few updates ago when loading all meshes so the real benefits are somewhat faster rendering for mesh-heavy scenes and somewhat faster download for mesh data. There are some further size optimizations we can do, as well as some improvements in CPU time it takes to decode a mesh - but all of these require a new mesh format so it will not happen very quickly.
Various fixes and improvements
With all of these large fixes going in this will be short and simple:
- Fix stability issues on Android 4.1/4.2 that were pretty frequent for the users of some devices on these versions
- Fix a problem where unequipping one hat and reequipping another one could result in a period of time where the new hat would use the texture from the old hat (mostly affects avatar editor)
As usual, if you have any issues with any of these updates or any questions, feel free to post them here.