One Million Sprites. More Than 120fps. DOTS Not Required.

When you lurk in the DOTS forum, you’ll see guys like this who made a DOTS library that can render one million animated sprites yet still get 60fps. I have created my own DOTS sprite renderer which is good enough for our game but it can’t handle a million. I got curious.

So I forked the repository and evaluated it if we can use it in Academia. I played with it for a bit like rendering a single sprite, a hundred, then thousands. I discovered that it’s not quite ready to be used in our game. There are missing stuff that we need like sorting of sprites from back to front. I tried to hack this feature. When I read the code, I realized that maybe I should make an entirely new library that we can use. I just have to understand how they render their sprites. I kind of get the gist of it anyway.

The Very Basic

If I have to replicate their rendering technique, I have to do the very basic: render a single sprite. Their library uses ComputeBuffers. The use of ComputeBuffers is supposedly to pass calculations to the GPU using compute shaders. I didn’t know that you could use them to a normal shader that renders something on screen. Think of them as arrays of numbers that you can set to the material, then the shader can access these arrays. So you can pass anything like position, rotation, scale, uv coordinates, colors, anything you can think of. The following is the shader that I’ve tweaked based from the awesome library:

  Shader "Instanced/ComputeBufferSprite" {
    Properties {
        _MainTex ("Albedo (RGB)", 2D) = "white" {}
    }
    
    SubShader {
        Tags{
            "Queue"="Transparent"
            "IgnoreProjector"="True"
            "RenderType"="Transparent"
        }
        Cull Back
        Lighting Off
        ZWrite On
        Blend One OneMinusSrcAlpha
        Pass {
            CGPROGRAM
            // Upgrade NOTE: excluded shader from OpenGL ES 2.0 because it uses non-square matrices
            #pragma exclude_renderers gles

            #pragma vertex vert
            #pragma fragment frag
            #pragma target 4.5

            #include "UnityCG.cginc"

            sampler2D _MainTex;

            // xy for position, z for rotation, and w for scale
            StructuredBuffer<float4> transformBuffer;

            // xy is the uv size, zw is the uv offset/coordinate
            StructuredBuffer<float4> uvBuffer; 

	        StructuredBuffer<float4> colorsBuffer;

            struct v2f{
                float4 pos : SV_POSITION;
                float2 uv: TEXCOORD0;
		        fixed4 color : COLOR0;
            };

            float4x4 rotationZMatrix(float zRotRadians) {
                float c = cos(zRotRadians);
                float s = sin(zRotRadians);
                float4x4 ZMatrix  = 
                    float4x4( 
                       c,  -s, 0,  0,a
                       s,  c,  0,  0,
                       0,  0,  1,  0,
                       0,  0,  0,  1);
                return ZMatrix;
            }

            v2f vert (appdata_full v, uint instanceID : SV_InstanceID) {
                float4 transform = transformBuffer[instanceID];
                float4 uv = uvBuffer[instanceID];
                
                //rotate the vertex
                v.vertex = mul(v.vertex - float4(0.5, 0.5, 0,0), rotationZMatrix(transform.z));
                
                //scale it
                float3 worldPosition = float3(transform.x, transform.y, -transform.y/10) + (v.vertex.xyz * transform.w);
                
                v2f o;
                o.pos = UnityObjectToClipPos(float4(worldPosition, 1.0f));
                
                // XY here is the dimension (width, height). 
                // ZW is the offset in the texture (the actual UV coordinates)
                o.uv =  v.texcoord * uv.xy + uv.zw;
                
		        o.color = colorsBuffer[instanceID];
                return o;
            }

            fixed4 frag (v2f i) : SV_Target{
                fixed4 col = tex2D(_MainTex, i.uv) * i.color;
				clip(col.a - 1.0 / 255.0);
                col.rgb *= col.a;

				return col;
            }

            ENDCG
        }
    }
}

The variables transformBuffer, uvBuffer, and colorsBuffer are the “arrays” that we set in code using ComputeBuffers. These are all we need (for now) to render a sprite. Here’s the MonoBehaviour script to render a single one:

public class ComputeBufferBasic : MonoBehaviour {
    [SerializeField]
    private Material material;

    private Mesh mesh;
    
    // Transform here is a compressed transform information
    // xy is the position, z is rotation, w is the scale
    private ComputeBuffer transformBuffer;
    
    // uvBuffer contains float4 values in which xy is the uv dimension and zw is the texture offset
    private ComputeBuffer uvBuffer;
    private ComputeBuffer colorBuffer;

    private readonly uint[] args = {
        6, 1, 0, 0, 0
    };
    
    private ComputeBuffer argsBuffer;

    private void Awake() {
        this.mesh = CreateQuad();
        
        this.transformBuffer = new ComputeBuffer(1, 16);
        float scale = 0.2f;
        this.transformBuffer.SetData(new float4[]{ new float4(0, 0, 0, scale) });
        int matrixBufferId = Shader.PropertyToID("transformBuffer");
        this.material.SetBuffer(matrixBufferId, this.transformBuffer);
        
        this.uvBuffer = new ComputeBuffer(1, 16);
        this.uvBuffer.SetData(new float4[]{ new float4(0.25f, 0.25f, 0, 0) });
        int uvBufferId = Shader.PropertyToID("uvBuffer");
        this.material.SetBuffer(uvBufferId, this.uvBuffer);
        
        this.colorBuffer = new ComputeBuffer(1, 16);
        this.colorBuffer.SetData(new float4[]{ new float4(1, 1, 1, 1) });
        int colorsBufferId = Shader.PropertyToID("colorsBuffer");
        this.material.SetBuffer(colorsBufferId, this.colorBuffer);

        this.argsBuffer = new ComputeBuffer(1, this.args.Length * sizeof(uint), ComputeBufferType.IndirectArguments);
        this.argsBuffer.SetData(this.args);
    }

    private static readonly Bounds BOUNDS = new Bounds(Vector2.zero, Vector3.one);

    private void Update() {   
        // Draw
        Graphics.DrawMeshInstancedIndirect(this.mesh, 0, this.material, BOUNDS, this.argsBuffer);
    }
    
    // This can be refactored to a utility class
    // Just added it here for the article
    private static Mesh CreateQuad() {
        Mesh mesh = new Mesh();
        Vector3[] vertices = new Vector3[4];
        vertices[0] = new Vector3(0, 0, 0);
        vertices[1] = new Vector3(1, 0, 0);
        vertices[2] = new Vector3(0, 1, 0);
        vertices[3] = new Vector3(1, 1, 0);
        mesh.vertices = vertices;

        int[] tri = new int[6];
        tri[0] = 0;
        tri[1] = 2;
        tri[2] = 1;
        tri[3] = 2;
        tri[4] = 3;
        tri[5] = 1;
        mesh.triangles = tri;

        Vector3[] normals = new Vector3[4];
        normals[0] = -Vector3.forward;
        normals[1] = -Vector3.forward;
        normals[2] = -Vector3.forward;
        normals[3] = -Vector3.forward;
        mesh.normals = normals;

        Vector2[] uv = new Vector2[4];
        uv[0] = new Vector2(0, 0);
        uv[1] = new Vector2(1, 0);
        uv[2] = new Vector2(0, 1);
        uv[3] = new Vector2(1, 1);
        mesh.uv = uv;

        return mesh;
    }
}

Let me guide you through the code from top to bottom. For the material, you’ll have to create a new material then set the shader above. Assign a texture/spritesheet to it. I’m using the spritesheet from the library which is 4×4 sprites of emojis.

The mesh here is the mesh created by CreateQuad(). Well, it’s just a quad made of two triangles. Next you’ll see the 3 ComputeBuffer variables that we’ll set later on to the material. I named them the same from the StructuredBuffer variables in the shader for good mapping. They didn’t have to be but it’s good UX.

The variables args and argsBuffer are to be used for calling Graphics.DrawMeshInstancedIndirect(). Documentation could be found here. The function requires a buffer with 5 uint values. For our use case, the first two are the important ones. The first is the number of indices, which for our quad is 6. The second is the number of times the quad is to be rendered which is just 1. I imagine this is also the max number used by the shader to index the StructuredBuffer. Kind of like this:

for(int i = 0; i < count; ++i) {
    CallShaderUsingThisIndexForBuffers(i);
}

The Awake() method is really just preparation of the ComputeBuffers then set them unto the material. We’re rendering a sprite at (0, 0) scaled at 0.2f with no rotation. For UV, we’re using the sprite at the bottom-left (the kiss emoji). Then we set it with a white color. The args array is also set to the argsBuffer.

On Update(), we just call Graphics.DrawMeshInstancedIndirect(). (I don’t quite understand yet the use of BOUNDS here but I just copied it from the library.)

The last few steps is to prepare a scene with an orthographic camera. Create another GameObject and add the ComputeBufferBasic component. Set it with a material that uses the shader that I just showed. On play, you will get this:

Tadaa! A sprite rendered using a ComputeBuffer.

If you can do one, you can do many

Now that we can render a single sprite using ComputeBuffers, we can certainly do many. Here’s another script that I created that has a count parameter and will render such amount of sprites with random position, scale, rotation, and color:

public class ComputeBufferMultipleSprites : MonoBehaviour {
    [SerializeField]
    private Material material;
    
    [SerializeField]
    private float minScale = 0.15f;
    
    [SerializeField]
    private float maxScale = 0.2f;  

    [SerializeField]
    private int count;

    private Mesh mesh;
    
    // Matrix here is a compressed transform information
    // xy is the position, z is rotation, w is the scale
    private ComputeBuffer transformBuffer;
    
    // uvBuffer contains float4 values in which xy is the uv dimension and zw is the texture offset
    private ComputeBuffer uvBuffer;
    private ComputeBuffer colorBuffer;

    private uint[] args;
    
    private ComputeBuffer argsBuffer;

    private void Awake() {
        QualitySettings.vSyncCount = 0;
        Application.targetFrameRate = -1;
        
        this.mesh = CreateQuad();
        
        // Prepare values
        float4[] transforms = new float4[this.count];
        float4[] uvs = new float4[this.count];
        float4[] colors = new float4[this.count];

        const float maxRotation = Mathf.PI * 2;
        for (int i = 0; i < this.count; ++i) {
            // transform
            float x = UnityEngine.Random.Range(-8f, 8f);
            float y = UnityEngine.Random.Range(-4.0f, 4.0f);
            float rotation = UnityEngine.Random.Range(0, maxRotation);
            float scale = UnityEngine.Random.Range(this.minScale, this.maxScale);
            transforms[i] = new float4(x, y, rotation, scale);
            
            // UV
            float u = UnityEngine.Random.Range(0, 4) * 0.25f;
            float v = UnityEngine.Random.Range(0, 4) * 0.25f;
            uvs[i] = new float4(0.25f, 0.25f, u, v);
            
            // color
            float r = UnityEngine.Random.Range(0f, 1.0f);
            float g = UnityEngine.Random.Range(0f, 1.0f);
            float b = UnityEngine.Random.Range(0f, 1.0f);
            colors[i] = new float4(r, g, b, 1.0f);
        }
        
        this.transformBuffer = new ComputeBuffer(this.count, 16);
        this.transformBuffer.SetData(transforms);
        int matrixBufferId = Shader.PropertyToID("transformBuffer");
        this.material.SetBuffer(matrixBufferId, this.transformBuffer);
        
        this.uvBuffer = new ComputeBuffer(this.count, 16);
        this.uvBuffer.SetData(uvs);
        int uvBufferId = Shader.PropertyToID("uvBuffer");
        this.material.SetBuffer(uvBufferId, this.uvBuffer);
        
        this.colorBuffer = new ComputeBuffer(this.count, 16);
        this.colorBuffer.SetData(colors);
        int colorsBufferId = Shader.PropertyToID("colorsBuffer");
        this.material.SetBuffer(colorsBufferId, this.colorBuffer);

        this.args = new uint[] {
            6, (uint)this.count, 0, 0, 0
        };
        this.argsBuffer = new ComputeBuffer(1, this.args.Length * sizeof(uint), ComputeBufferType.IndirectArguments);
        this.argsBuffer.SetData(this.args);
    }

    private static readonly Bounds BOUNDS = new Bounds(Vector2.zero, Vector3.one);

    private void Update() {   
        // Draw
        Graphics.DrawMeshInstancedIndirect(this.mesh, 0, this.material, BOUNDS, this.argsBuffer);
    }

    private static Mesh CreateQuad() {
        // Just the same as previous code. I told you this can be refactored.
    }
}

It’s mostly the same with the single sprite. The difference is we’re now preparing the arrays with X contents specified by the serialized variable count. We also set the second number in args array to the value of count.

With this script, you can set any number to the count and it will generate such number of sprites but render them in only a single draw call.

This is 10,000 random sprites.

Why are minScale and maxScale serialized variables? When I was testing this with 600,000, I noticed that the performance degraded to below 60fps. If the original library can do a million, why can’t this?

This is 600,000 sprites. It’s slow.

My hunch is maybe it’s because of overdraw. So I made minScale and maxScale as serialized parameters and set small numbers like 0.01 and 0.02. Only then was I able to replicate a million sprites in over 60fps (editor profiler). It could probably accommodate more but who needs a million sprites? Our game doesn’t event need a quarter of that.

A million tiny sprites.

Profiler

So I wanted to see how this runs in a development build. My machine specs are 3.7Ghz (4 CPUs), 16GB RAM, Radeon RX 460 Graphics. This is what I got:

As you can see, this is quite fast. The call to Graphics.DrawMeshInstancedIndirect() displays 0ms. Although I’m not so sure if I should be worried about that Gfx.PresentFrame.

Not so fast

While this is impressive, this is not how it will be used in an actual game. The major thing missing here is the sorting of sprites. That’s going to take a big chunk of CPU. The ComputeBuffers would also have to be updated every frame if there are moving sprites. There’s still a lot of work to do. I don’t expect to reach one million with an actual usable framework, but if I can get to render like 300,000 under 2ms, that’s good enough for me. DOTS will definitely help here but that’s a topic for another day.

Like my posts? Follow me on Twitter!

4 thoughts on “One Million Sprites. More Than 120fps. DOTS Not Required.

  1. >(I don’t quite understand yet the use of BOUNDS here but I just copied it from the library.)

    This is bounding box for whole DrawMeshInstancedIndirect batch, and in this case it’s wrong bounds, it should cover all area from of min max positions, otherwise when you’ll move camera and your (0,0,0) goes off frustrum more than 1 on any axis all “sprites” wouldn’t rendering because they will be culled by frustrum culling on this bounds.

    Whole this ‘profiling’ a bit wrong, because all this sprites static and you setup them only once on Awake, when you’ll start move transform matrices you’ll need to setup matrices on every change, which mean you will call SetBuffer and this is main bottleneck of ComputeBuffers. Luckily in 2020.1 we have Begin\End write and can write to Compute Buffers asynchronous from jobs

    Like

    1. I see. Thanks for that.

      I’m currently making a framework now with transform changes. SetBuffer() is indeed a bottleneck as well as sorting, but it’s still faster than our old system.

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s