Rendering Updates - September 2017

zeuxcg · September 29, 2017, 5:48pm

This is a thread with rendering updates for September! I feel like these are still valuable even though we started posting the global changelog - please let me know if you think otherwise. Thanks to @ConvexRumbler, @Homeomorph, @maxvee, @programeow, @Qiblox and @zeuxcg for working on these!

Instancing is sometimes live

We’ve been working on a new part rendering system (codename “instancing”) for a while and after a few attempts that some of you have noticed that caused very visible bugs, it’s reasonably ready to be enabled on some platforms.

The new system currently is only enabled for MeshParts and CSGs (not SpecialMeshes!) and is only working on Windows for NVidia/AMD GPUs; over the next few months we will try to release it on as many platforms/GPUs possible. We are also working on making other part types compatible with instancing but it will take a while.

TL;DR is that rendering memory of MeshPart/CSG objects now depends on the number of unique instances of these objects; that is, cloning these objects consumes minimal amount of extra memory; additionally, all property updates are as fast as they can be and don’t require expensive reclustering operations.

More words on the subject

I'll share a few details for the inquisitive minds.

For historical reasons we call part rendering system behavior “clustering” - it groups objects into clusters based on proximity and whether parts are moving or not, and tries to maximize the performance of rendering each cluster. The existing clustering system was written in 2012 and is otherwise known as “featherweight parts”. We have used this system ever since; it survived our deprecation of fixed-function hardware (that doesn’t have shaders - think 15 years old GPU), our switch to the new rendering engine away from Ogre, our transition to DirectX 11 and Metal, etc.

Very briefly, the current clustering system works by assembling one large mesh for the entire cluster out of all parts that constitute the mesh. All part information - size, various appearance properties like reflectance, part color, part transformation - is baked into this mesh. Whenever any property of any part in the cluster changes, we have to rebuild this mesh. This code is pretty well optimized for what it does, but fundamentally it has to do an expensive operation for each update.

This is because the system tries to minimize the number of draw calls dispatched to the GPU, and assumes that updates are infrequent and that each individual part is simple. Remember, in 2012, we had no CSGs, no meshes (apart from character meshes - clustering system treats each character as a separate cluster) and simpler worlds in general. Draw calls were also REALLY expensive, particularly on mobile which we couldn’t have shipped without this system, because of many inefficiencies in Ogre rendering engine at the time, and a different set of graphics APIs/features that we had to support.

There’s a special exception for constantly moving parts - we extract their CFrame information into a separate buffer and update this buffer independently which is much faster than rebuilding the cluster. The exception exists because physics simulation is super common in Roblox, whereas massive modifications of other properties were not as common.

Today, we live in a different world - we have CSGs and MeshParts that shift the complexity balance - each object is much more complex than a block - we have a new rendering engine that makes it much simpler for us to experiment with alternative approaches to render objects - we have many cases where developers build complex vibrant worlds and the clustering cost due to updates starts to dominate execution time - and we have increasing mobile reach where memory consumption is crucial.

We’ve tried to optimize the existing system for memory consumption - see many of the previous rendering updates - but fundamentally we know that at some point in the future we need to switch to another one. So we did - special credit for @maxvee for persevering during the development ;). Instancing works by batching as many objects of the same archetype - same MeshPart mesh, same CSG render data, and eventually same shape of a basic part - into one draw call, and having GPU fetch part properties like size, cframe, color, from a separate buffer that we can update much faster. Additionally (importantly!), we no longer assemble a mesh per cluster which means that each MeshPart or CSG only has its vertices in memory once on the GPU, instead of having a baseline CPU copy and many GPU-side copies in various clusters.

We’ve also paid attention to making sure that updates of properties have a predictable cost - where previously one update to one part could cause as much workload as 100 updates to neighboring parts because we could only update a cluster as a unit, now we try to make sure update cost scales with the amount of updates. Because of this, the new system does not throttle updates - when the new system is active (remember, it will take a while for it to be available on all hardware or for all object types), all updates are immediate.

Part of the reason why the development of this new system took time is because it’s a complete rewrite of part renderer that has to preserve many odd features (we’ve discovered some glaring inconsistencies in the behavior of CSG parts that is unfortunately hard to fix without breaking the existing content - so we have to replicate this behavior, aka be bug-compatible with the old system); part of it though is that the reason why we can do these things is that we rely on GPU features that we haven’t used before - they became commonplace in DirectX 10 era, but we still encounter various issues with drivers on various platforms, so bear with us as we figure out how to make it work on as many devices as we can.

There are some known behavioral bugs that are getting fixed shortly:

Neon MeshPart/CSG parts don’t render unless there’s a Neon non-mesh/csg part in the view
Material UVs are not randomized creating tiling artifacts

If you discover any other bugs please report them as separate threads in the Bug Reports forum.

UI is now rendered at native resolution on mobile

For a few years we had a feature where on Retina-resolution devices on iOS we would render UI at “native” resolution, 3D at a reduced resolution to maintain performance, and then composit the two. I say native in quotes because we actually assumed that retina scaling factor is 2x; this was true for a while but with iPhone 6+, 7+ and X this is no longer true.

On Android, due to a set of technical issues and also due to this not being a high priority, we just rendered at a reduced resolution. However, with further development of some internal projects and with increased focus on mobile and on beautiful UIs, this became easier and more important so we did it.

Now we will always render UI at the native resolution on mobile, both on iOS and Android, including 3x on larger iPhone devices and 4x or whatever it is the 4K Android phones have these days. We will also rasterize fonts at this native resolution and many of our built-in UIs will have images for 1x/2x/3x. As a result, our UIs and your UIs will look much better on mobile devices.

3D is still rendered at non-native resolution; 3D UI will render at non-native resolution if it’s not AlwaysOnTop, and at native if it is.

Studio now optimizes meshes on import

We now apply additional processing when importing mesh parts in Studio, that preserves the appearance but optimizes the memory used by meshes, the speed at which GPU renders them and the size of the data client has to download to use the mesh.

We’re still applying the memory fix mentioned a few updates ago when loading all meshes so the real benefits are somewhat faster rendering for mesh-heavy scenes and somewhat faster download for mesh data. There are some further size optimizations we can do, as well as some improvements in CPU time it takes to decode a mesh - but all of these require a new mesh format so it will not happen very quickly.

Various fixes and improvements

With all of these large fixes going in this will be short and simple:

Fix stability issues on Android 4.1/4.2 that were pretty frequent for the users of some devices on these versions
Fix a problem where unequipping one hat and reequipping another one could result in a period of time where the new hat would use the texture from the old hat (mostly affects avatar editor)

As usual, if you have any issues with any of these updates or any questions, feel free to post them here.

ScriptOn · September 29, 2017, 6:57pm

DOES THIS MEAN I CAN SPAM COLORS AND TRANSPARENCY WITHOUT IT TAKING A LONG TIME TO UPDATE NOW??

This is so awesome.

Is it worth re-uploading meshes from my game for this?

zeuxcg · September 29, 2017, 7:15pm

Yeah but note that it’s currently only working for some Windows users and only for MeshPart/CSGs.

Depends on how painful it is for you, in general I would say no unless it’s really mesh heavy and you’re struggling with either download times for all the assets or GPU-side rendering performance.

younite · September 29, 2017, 7:25pm

Awesome work as always guys! Do you think the texture bugs could be addressed (for mobile devices, they don’t display that well and it doesn’t look as seamless)? That’s my only complaint about mobile, the UI update is great!

zeuxcg · September 29, 2017, 7:38pm

Can you clarify what you mean? Screenshots would help.

AbstractAlex · September 29, 2017, 7:48pm

Can you elaborate on the optimization? Is it to force shared-vertices when some export formats don’t do this?

Osyris · September 29, 2017, 8:00pm

This is awesome! The avatar editor on Android phones always looked so bad compared to iOS devices or even other native elements on the same screen (R$ icon up top, or navigation bar below)

It’s now basically impossible to tell what’s rendering natively and what’s coming from the engine. Very cool!

zeuxcg · September 29, 2017, 8:06pm

If you’re asking specifically about memory optimization then yeah, FBX import code before the fixes resulted in deindexed meshes (there was vertex sharing within one polygon which means “no sharing” for triangle topology and “minimal sharing” for quads). Other optimizations are covered here: GitHub - zeux/meshoptimizer: Mesh optimization library that makes meshes smaller and faster to render.

younite · September 29, 2017, 8:55pm

Maybe it was fixed already, I mean it is a bug from March.

zeuxcg · September 29, 2017, 10:22pm

Ah, this one. There will be news on materials on mobile next month.

Blue101black · September 30, 2017, 9:03am

Awesome work

@zeuxcg When can we expect the material colour not being rendered in 3D software for mac be fixed?

Currently. I can import parts, but when I render in Blender I get the material, but all colours are white.

Elmuowo · September 30, 2017, 10:39pm

In the code for that, atleast in the demo, you seem to use std::vector::push_back,
which seemed odd to me.
I’m not really proficient in C++ enough yet to be a judge of other peoples code,
but it just seems odd to me.

C++11 vectors have the “new” function emplace_back.
Unlike push_back, which relies on compiler optimizations to avoid copies, emplace_back uses perfect forwarding
to send the arguments directly to the constructor to create an object in-place, so you wont have to worry about unnecessary copying at all.

Are you guys not using C++11 or is there a specific reason for the usage of push_back? Feels like its basically deprecated.

Just interested.

zeuxcg · October 1, 2017, 3:40pm

We’re working on this.

zeuxcg · October 1, 2017, 3:47pm

While Roblox has switched to C++11 on all platforms a while back, this library is compatible with C++98 intentionally because it’s meant for a wider audience. There are some companies developing software for some recent consoles that still don’t support C++11, for example.

push_back in C++11 has an overload defined that uses an rvalue ref, so emplace_back and push_back will perform the same if you push the object of the same type as the vector contains. The real difference is that emplace_back can construct the object in place, so that instead of v.push_back(Foo(arg1, arg2)) (which triggers a ctor and then a move ctor) you can say v.emplace_back(arg1, arg2) which constructs the object directly

All of the above is irrelevant for fundamental types, of course. As for POD types - this library only uses fundamental & POD types as vector elements - the major performance difference is not move ctors etc. but whether push_back/emplace_back are reliably inlined by the compiler. Unfortunately, Visual Studio STL implementation is bad in this regard - due to how these are implemented, in VS2015 you will not get push_back inlined, but will get emplace_back inlined (which has nothing to do with theoretical advantages of emplace_back vs push_back, just how they are implemented); in VS2017 you will get neither inlined even for vectors of basic types. See this Twitter thread: x.com + x.com (both links are from the same thread but Twitter thread display is incapable of showing the entire thread with one link it seems…)

ScriptOn · October 1, 2017, 10:45pm

When will it be enabled on parts?

Blue101black · October 2, 2017, 8:09am

Also now only getting white materials on windows.

zeuxcg · October 2, 2017, 6:17pm

At some yet to be determined point next year.

system · July 27, 2019, 1:14pm

This topic was automatically closed 120 days after the last reply. New replies are no longer allowed.