We’ve released instancing support for mesh & CSG parts last year; we are getting ready to release part instancing which would extend the concept to other part types; this post is a heads-up that this will happen in a few weeks, and contains some technical details for the inquisitive minds.
Note: part instancing is designed to just work, and requires no user interaction. If you stick to basic parts and reuse meshes/textures more (and make sure Lighting.Outlines
is disabled), you’re on the good side of performance, and you can skip reading the rest.
Posting this on behalf of @maxvee who spent lots of time implementing various part rendering features and trying to make sure they match the existing rendering pipeline.
What is instancing?
Graphics engine internals: utilizes hardware capabilities to draw many similar objects at once. Has been around since ca. 2005, so it’s about time.
Instancing for Meshes and CSGs was shipped in (about) October 2017. If you haven’t noticed anything, I’ll take it as a compliment.
Advantages:
- saves a lot of graphics memory used by not making many copies of the same piece of geometry;
- eliminates expensive part re-clustering caused by moving parts, dynamic updates, etc., potentially shaving off 4-1000000 ms per frame;
(Previously, e.g. changing color of a single part triggered re-generation of everything in the general vicinity of the offender.) - (almost) immediate part updates;
Disadvantages:
- batching is less efficient and highly depends on the number of unique pieces of geometry used;
- requires graphics hardware support (D3D11, GL3, GLES 3, Vulkan, Metal). Android support is the worst: a lot of devices are too old; and a lot are too buggy to support it reliably.
There are two new “metrics” (*) here to be (mildly) aware of: batching efficiency and update performance.
Batching efficiency
Very high number of draw calls is a CPU bottleneck in rendering code - the time it takes for our code to talk to e.g. D3D11 which will talk to the driver, and that’s for every frame. We can mitigate the issue by submitting many similar parts per single draw call.
Similar in this case means:
- parts have the same geometry (to a certain extent).
- parts have the same shader (material)
- parts have the same texture (if plastic)
For example:
-
Many meshes that refer to the same assetid will share their geometry. I.e. there will be only one vertex buffer with exactly one copy of the mesh, and it will be used to render all of them. There is no more duplication of meshes when forming geometry clusters.
-
If, say, 100 meshes use the same material and texture, they will be rendered in a single draw call, unless other properties disagree (see below). If half of your meshes use bricks, one quarter uses plastic with one texture, and the remaining quarter uses plastic with another texture, there will be three draw calls, unless other properties disagree.
You can gauge batching efficiency by pressing Ctrl+Shift+F2 and looking at the bottom. One line should read, e.g. (Village template):
Clusters: fw 0c 0p; dyn 0c 0p; hum ...c ...p; inst 51c 561e 4559i
0c 0p - means there are no FastClusters, everything is instanced;
hum … - that’s for humanoids, we won’t touch those for now;
inst 51c 561e 4559i - means there are 51 instanced clusters with 561 render entities and 4559 instances.
The number of entities is essentially the number of batches (draw calls) it would take to render all 4559 parts in the scene. This tells us that on average there will be about 9 instances per draw call (which is not great…).
So the bigger the ratio of instances to entities, the better the batching efficiency is. The theoretical maximum is about 512 (for now).
The following properties do not incur batching costs (i.e different parts count as similar):
For parts:
- Cframe (position/rotation)
- Size(+)
- Color/BrickColor
- UsePartColor
- Reflectance
For SpecialMeshes:
- Offset
- Scale
- VertexColor
For Decals:
- Color3
- Transparency
- StudsPerTileU/V
+ - the following exceptions for Size apply:
- trusses have to have the same number of segments, otherwise a different piece of geometry is generated for every unique number of segments.
- elongated head SpecialMeshes will turn into cylinders, which will effectively split the batch into two
The following are known notorious Batch Wreckers:
- MeshType
- MeshId
- TextureId
- Material
- Stud configuration - this one will generate a slightly different copy per part type, per face, per stud type. There are about 2700 different combinations of just those, so be careful with studs. Stud configuration has no effect on MeshPart or CSG batching.
- Transparency - this one is the worst. Since OIT is still expensive (ask me again in 5 years), we have to force a single instance per draw call for each transparent part. Does not pertain to decal transparency.
Properties not explicitly mentioned here, like Name or Velocity, have no effect on graphics.
Note on decals: internally, decals use a separate geometry piece that closely follows the object that they’re mapped on top of. They are rendered with transparency on all the time, but it’s not as ridiculous as one per draw call.
Update performance
Batching efficiency alone is a good indicator of static performance, i.e. it is the “base cost” of just rendering so many things. When parts are dynamic, though, additional performance considerations come into play.
Relative costs of updates, from faster to slower
-
Nothing - does not incur any dynamic costs. Static objects are not updated at all.
-
CFrame (position/rotation) of meshes, CSGs, blocks/cylinders/balls with no specialmeshes.
This is “the fast path”. As cheap as patching a few floats in a struct that the renderer sends to the GPU. -
Color, UsePartColor, Size, Reflectance.
Triggers a full update for the part and a bbox update for the cluster. -
SpecialMeshes.
Approx. 10x slower to update than basic parts, also triggers bbox update for the cluster. Also, there is no ‘fast path’ for SpecialMeshes. -
CFrame, moving across cluster boundaries.
If a position update moves the part too far to a different cluster, internally this triggers part handover logic, which involves bumping of a few lists, etc. Will trigger bbox updates for two clusters. -
Transparency.
Changing transparency from nonzero to nonzero is the same as color/size/etc. However, transitioning between zero to nonzero always involves creation/destruction of a few internal graphics objects. -
Changing anything else (graphics-related, doesn’t include Name or Velocity) triggers re-creation of internal graphics objects. This also includes any changes to object’s decals and child SpecialMesh properties. If it had any decals, the decals are also re-created. Expect memory allocations, extending lists, updates to clusters.
Note that multiple property updates are handled properly, and graphics objects are updated (almost) only once.
Other noteworthy changes
- Head SpecialMeshes no longer “expand” as before, they are simply scaled up to a certain size, and then replaced with a cylinder, with decals disabled. (see https://devforum.roblox.com/t/potential-deprecation-of-non-uniform-head-scaling-feedback-welcome/101768/7)
- Torso SpecialMeshes are rendered as boxes. (see SpecialMesh.MeshType=Enum.MeshType.Torso will be deprecated soon)
- Outlines are not supported. Turning on outlines inhibits part instancing for the entire place file. (Meshes and CSGs are unaffected.)
- For wedge parts, studs on slant faces will look a bit “non-Euclidean” when at 45 degrees, due to non-uniform scaling.
(*) - not actually metrics.