Some Unity ECS Utilities

Not much has happened since last month in terms of programming breakthrough. I guess we’re just busy adding features to our game. As a fallback article, I’ll share some utility code that we use, instead. These are mainly used in ECS.

BufferElement<T>

In Unity’s ECS, collections under components are not allowed (yet). The solution for now is to use DynamicBuffer. It’s like a component that you can add to your entities but it’s a list. However, it only accepts value types that implements the IBufferElementData interface. There are times when I just want a list of entities or vectors, but Entity and float3 does not implement IBufferElementData. It’s kind of a pain to create wrapper structs for each type. So I created BufferElement<T>:

[InternalBufferCapacity(10)]
public readonly struct BufferElement<T> : IBufferElementData where T : struct {
    public readonly T value;

    public BufferElement(T value) {
        this.value = value;
    }
}

// Usage
// Say we need a list of Entity targets
Entity entity = this.EntityManager.CreateEntity(typeof(BufferElement<Entity>));
DynamicBuffer<BufferElement<Entity>> targets = this.EntityManager.GetBuffer<BufferElement<Entity>>(entity);

Entity[] targets = GetTargets();
for(int i = 0; i < targets.Length; ++i) {
    targets.Add(new BufferElement<Entity>(targets[i]));
}

CollectedCommandsSystem

Unity’s ECS was released when our game Academia is already too big with heavy OOP usage. It’s unreasonable to rewrite it to pure ECS. So we’re using a lot of the hybrid interface.

In MonoBehaviour world, you can get access to an EntityManager and create entities with it. One problem we encountered is that too much usage of EntityManager is slow. It has something to do with changing ECS chunk data for each entity/component alteration calls like AddComponentData() or SetComponentData(). The recommendation is to use an EntityCommandBuffer to temporarily pool the commands then execute them in one go so that chunk changes are minimized.

I wanted something that I could use in any of my MonoBehaviours then automatically flush the commands whenever ECS systems kick in. I made CollectedCommandsSystem:

public class CollectedCommandsSystem : ComponentSystem {
    private EntityCommandBuffer? pendingBuffer;

    protected override void OnUpdate() {
        if (this.pendingBuffer != null) {
            this.pendingBuffer.Value.Playback(this.EntityManager);
            this.pendingBuffer.Value.Dispose();
            this.pendingBuffer = null;
        }
    }
    
    public EntityCommandBuffer Buffer {
        get {
            if (this.pendingBuffer == null) {
                this.pendingBuffer = new EntityCommandBuffer(Allocator.TempJob);
            }

            return this.pendingBuffer.Value;
        }
    }
}

// Usage
// Say in a MonoBehaviour
private CollectedCommandsSystem collectedCommands;

private void Awake() {
    this.collectedCommands = World.Active.GetOrCreateSystem<CollectedCommandsSystem>();
}

private void Update() {
    // Then somewhere along the way, use the buffer to alter ECS data
    EntityCommandBuffer buffer = this.collectedCommands.Buffer;
    Entity entity = buffer.CreateEntity();
    buffer.AddComponentData(entity, new Whatever());
}

SharedComponentQuery<T>

When I was creating our custom rendering system, one of the things that I constantly have to deal with are shared components. While working with them, I found that I could make something such that I don’t have to repeat the same code every time I need to handle them. This is what I came up:

public class SharedComponentQuery<T> where T : struct, ISharedComponentData {
    private readonly ComponentSystemBase system;
    private readonly EntityManager entityManager;

    [ReadOnly]
    private ArchetypeChunkSharedComponentType<T> sharedComponentType;

    private readonly List<T> sharedComponents = new List<T>();
    private readonly List<int> indices = new List<int>();

    public SharedComponentQuery(ComponentSystemBase system, EntityManager entityManager) {
        this.system = system;
        this.entityManager = entityManager;
    }

    public void Update() {
        this.sharedComponentType = this.system.GetArchetypeChunkSharedComponentType<T>();
        
        this.sharedComponents.Clear();
        this.indices.Clear();
        this.entityManager.GetAllUniqueSharedComponentData(this.sharedComponents, this.indices);
    }

    public T GetSharedComponent(ref ArchetypeChunk chunk) {
        int sharedComponentIndex = chunk.GetSharedComponentIndex(this.sharedComponentType);
        int uniqueIndex = this.indices.IndexOf(sharedComponentIndex);
        Assertion.Assert(uniqueIndex >= 0);
        return this.sharedComponents[uniqueIndex];
    }
    
    public IReadOnlyList<T> SharedComponents {
        get {
            return this.sharedComponents;
        }
    }

    public IReadOnlyList<int> Indices {
        get {
            return this.indices;
        }
    }
}

It’s a utility class where all of my common code regarding shared components are placed. Here’s a sample usage:

// Say you have a SharedComponent named RenderConfig
class RenderConfigSystem : ComponentSystem {
    private SharedComponentQuery<RenderConfig> query;

    protected override void OnCreate() {
        // Prepare the query
        query = new SharedComponentQuery<RenderConfig>(this, this.EntityManager);
    }

    protected override void OnUpdate() {
        this.query.Update();

        // Traverse through all RenderConfig
        IReadOnlyList<RenderConfig> configs = this.query.SharedComponents;

        // Start from 1 because it element at 0 is an auto generated instance
        for(int i = 1; i < configs.Count; ++i) {
            RenderConfig config = configs[i];
            // Do something with config here
        }
    }

    // Another usage is if you want to get the shared component of a chunk (there's only one per chunk)
    private void Process(ArchetypeChunk chunk) {
        RenderConfig config = this.query.GetSharedComponent(ref chunk);
    }
}

That’s all for now. 🙂

Advertisements

Replacing our OOP pathfinding to pure ECS

I’ve been writing about implementing A* in ECS to see if it’s faster. See here and here. It is indeed faster, way faster! I’m too ashamed to tell you that our old OOP A* is so slow. It runs on the range of 2ms to 50ms. This is already using Jump Point Search and multithreaded. It badly needs optimization. I want to use the ECS implementation, however, it’s not that easy to port an OOP code to pure ECS as it uses the HPC# subset (High Performance C#).

Think of HPC# as a viral change requirement. Since it can’t handle managed types, classes can’t be used. The obvious thing to do is to turn a class into a struct, but that’s not enough. Each member variables that is a managed type must be converted to something that HPC# allows. That means that each class referenced inside the converted class must be turned into a struct (or something else). Then you do this recursively to the member variables of those classes and so on.

Most of the time, it’s impossible. If you use any of the classes from .NET or from third party code, you’re stuck. You can’t edit those or it’s not advisable to do so. Just think how many classes or MonoBehaviours do you have that use List or Dictionary. Those code will be tough to convert or never at all.

Ephipany

While I was sharing this post on Reddit, somebody mentioned about copying the Tile information into a separate struct and use that in my ECS A*. I thought that this would be costly since every frame, I have to copy all of the grid’s Tile information to their struct copies. I kept thinking about this, then I realized I don’t need to make copies every frame. I just need to keep and maintain the struct copies.

This means that I should have a separate copy of grid of tiles that are implemented as structs (HPC#). They will only be updated every time the original Tile is changed. I also don’t need to copy every Tile information. I only need to copy what’s needed for pathfinding. The copying also need not to be too synchronized to save CPU. I can defer the copying to multiple frames. Tile changes are not very frequent in the game anyway.

I may have found an effective strategy to somehow port OOP code to pure ECS without a major overhaul. With this in mind, I set out to execute this major change.

Refactoring

This is a partial part of our Tile class:

public sealed class Tile {
    private readonly TileGrid grid; // The owner grid
    private readonly GridCell cell; // Contains position, size, etc

    private NodeHandle<Tile> handle;

    private readonly Dictionary<Layer, TileItem> itemMap = new Dictionary<string, TileItem>();
    private readonly Dictionary<Layer, TileItem> taskItemMap = new Dictionary<string, TileItem>();

    private readonly IntVector2 position;

    private int itemFlags;

    private Zone zone;

    ... 
}

This class is close to 400 lines of code. It handles so many gameplay rules regarding the tiles in the game. It wouldn’t be a good idea to turn this into HPC#. There’s a lot to convert here. At first glance, the usage of Dictionary alone would make the conversion difficult. There’s no HPC# equivalent of it yet. TileItem is a huge data class the contains information about what’s placed in a tile at a certain layer. It contains many member variables that are classes on its own. It’s the same with Zone. If I was thinking of refactoring the Tile for HPC# this way, I wouldn’t do it as it would cascade to a lot more refactoring to other systems. It’s too risky.

The Tile copy that is implemented as a struct looks like this:

public struct TileAsStruct : IComponentData {
    public int2 position;
    
    public int zoneGenderIndex;

    public byte flags;

    public bool Contains(byte flag) {
        return (this.flags & flag) > 0;
    }
    
    // The flags
    public const byte HAS_PHYSICAL_BLOCKER = 1;
    public const byte HAS_STUDENT_BLOCKER = 1 << 1;
    public const byte STAFF_ONLY = 1 << 2;
}

Simple, isn’t it? This is all I need for the purposes of pathfinding. It’s implemented as a component so I could easily make queries of it in system classes (the S in ECS). For the actual A* implementation, I used the same implementation as shown in this post. I implemented the custom HeuristicCostCalculator and Reachability that is required for the game’s rules.

As for the system that handles the copying of information from the original tile, I used a separate MonoBehaviour which listens for signals that causes Tile changes. On dispatch of such signals, I just enqueue the positions of the tiles that needs to be recopied. Then per Update(), I copy some X tiles to their struct copies that are taken from the queue.

private const int COPIES_PER_FRAME = 50;

private void Update() {
    int copiedCount = 0;
    while (this.dirtyTilesQueue.Count > 0 && copiedCount < COPIES_PER_FRAME) {
        CreateUpdateEntity();
        ++copiedCount;
    }
}

private void CreateUpdateEntity() {
    int2 position = this.dirtyTilesQueue.Dequeue();
    Tile tile = this.grid.GetTile(position.x, position.y);

    byte flags = 0;
    if (tile.HasPhysicalBlocker) {
        flags |= TileAsStruct.HAS_PHYSICAL_BLOCKER;
    }

    if (tile.HasStudentBlocker) {
        flags |= TileAsStruct.HAS_STUDENT_BLOCKER;
    }

    // Check if for staff only
    if (tile.Zone?.StaffOnly ?? false) {
        flags |= TileAsStruct.STAFF_ONLY;
    }

    TileAsStruct tileValue = new TileAsStruct() {
        position = position,
        zoneGenderIndex = tile.Zone?.GenderIndex ?? Gender.ANY.Index, // Use ANY if tile has no zone
        flags = flags
    };

    // Create the entity that will mark to update the TileAsStruct in the grid
    Entity updateEntity = this.entityManager.CreateEntity(typeof(UpdateTileAsStruct));
    this.entityManager.SetComponentData(updateEntity, new UpdateTileAsStruct(position, tileValue));
}
Just added this image so that the article is not a big wall of text.

Huge Improvements

Our old implementation is so slow that I only allowed up to 4 A* requests per frame. Now with the Burst compiled ECS implementation, I was able to increase it to 40 requests. That’s a 10x improvement! There’s even room for more (it takes around 300 requests to reach 16ms). But pushing the number up is not necessary. The simulation already looks good. Besides, I need to reserve CPU cycles for future features.

That’s it for now. 🙂

From 160 to 300 A* requests per frame

In my last post, I talked about achieving 160 requests per frame using burst compiled code. After more studies and iterations, I found that I could do better.

As I’m watching the profiler of my old implementation, I realized that each A* search is being executed one after the other. They were distributed in multiple threads but they’re not running in parallel. Unused thread execution time is wasteful. I need to write code that can execute A* searches in parallel. It wasn’t as easy as I initially thought. I had to run through some Unity ECS gotchas.

A* searches are distributed in all threads but are not running in parallel. Click here for larger image.

Obstacles

My first instinct is to use IJobParallelFor. I can schedule an IJobParallelFor for each chunk of requests. This means that I execute an A* search for each index:

struct AStarSearchParallelFor : IJobParallelFor {
    [ReadOnly]
    public NativeArray<Entity> entities; // The request entities

    public void Execute(int index) {
        // Perform a single A* search here
    }
}

This also implies that I have to instantiate local containers like a list for nodes, an open set which uses a heap, and a hash map that acts like a hash set for the close set. I couldn’t get this to work with the Burst compiler. I always get errors when disposing a container within the job struct. Somehow, this is not supported by Burst.

While rummaging through the ECS forums, I found a thread that answers my problem. It said that you don’t need to dispose containers that are allocated in a job that is allocated with Allocater.Temp. I don’t have a concrete evidence that this is true except that I’ve tried it in a test system and see if my memory usage grows (due to undisposed containers). My memory usage is not growing at all. So I’m going to have to trust that programmer who was replying in that thread.

struct AStarSearchParallelFor : IJobParallelFor {
    [ReadOnly]
    public NativeArray<Entity> entities; // The request entities

    public void Execute(int index) {
        Search search = new Search() {
            // Set parameters here
        };
        search.Execute();
    }

    // The whole A* algorithm is implemented here
    // I did it this way so I don't have to pass around too many
    // parameters if it were made by only using functions
    private struct Search {
        private NativeList<AStarNode> allNodes;
        private OpenSet openSet;
        private NativeHashMap<int2, byte> closeSet;

        public void Execute() {
            // It's ok not to dispose these (I hope)
            this.allNodes = new NativeList<AStarNode>(10, Allocator.Temp);
            GrowingHeap heap = new GrowingHeap(new NativeList<HeapNode>(10, Allocator.Temp));
            this.openSet = new OpenSet(heap, new NativeHashMap<int2, AStarNode>(10, Allocator.Temp));
            this.closeSet = new NativeHashMap<int2, byte>(10, Allocator.Temp);

            ... // Rest of the algorithm
        }
    }
}

From here, the next step is to pass the required components from chunks. My A* job requires four components:

  • AStarSearchParameters – Stores the start and goal positions
  • AStarPath – Stores whether the path is reachable or not. Also stores the number of positions to step through to get to the goal. Basically, this component can be used to traverse through the path.
  • DynamicBuffer<Int2BufferElement> – The list of positions from A* search will be stored here
  • Waiting – This is a common component that I use that has a “done” boolean. This will be used by agents to see if their A* request is already done so they can proceed to their next action.

As I’m working with chunks, it’s just natural that I use NativeArray for the components and a BufferAccessor for the list of positions. But there’s a problem here. BufferAccessor can’t be used in this case. I made a test for this:

private struct JobParallelFor : IJobParallelFor {
    [NativeDisableContainerSafetyRestriction]
    public BufferAccessor<Int2BufferElement> accessor;

    public void Execute(int index) {
        DynamicBuffer<Int2BufferElement> buffer = this.accessor[index];
        buffer.Clear();

        for (int i = 0; i < 1000; ++i) {
            buffer.Add(new Int2BufferElement() {
                value = new int2(i, i)
            });
        }
    }
}

The job above is a test code I made to see if I can use BufferAccessor in an IJobParallelFor. This works for only one chunk. If there’s more than one chunk to process, it throws errors. Something about a NativeArray can’t be accessed as a previous job is still writing to it. Intuitively, it should work as I’m only accessing one and only one DynamicBuffer per Execute(). But, I’m wrong. Still totally an ECS noob.

I need to find another way. Then I remember my old implementation. In there I used BufferFromEntity instead. This can still be applied to my parallel solution since I can retrieve the entities from chunks. I made a simple test if it would work in parallel:

private struct JobWithBufferFromEntity : IJobParallelFor {
    [ReadOnly]
    public NativeArray<Entity> entities;

    [NativeDisableContainerSafetyRestriction]
    public BufferFromEntity<Int2BufferElement> allBuffers;

    public void Execute(int index) {
        Entity entity = this.entities[index];
        DynamicBuffer<Int2BufferElement> buffer = this.allBuffers[entity];
        
        buffer.Clear();

        for (int i = 0; i < 1000; ++i) {
            buffer.Add(new Int2BufferElement() {
                value = new int2(i, i)
            });
        }
    }
}

Surprisingly, this works even with multiple chunks. With this, I wrote my A* job that runs in parallel:

[BurstCompile]
public struct AStarSearchParallelFor<HeuristicCalculator, ReachabilityType> : IJobParallelFor
    where HeuristicCalculator : struct, HeuristicCostCalculator where ReachabilityType : struct, Reachability {
    [ReadOnly]
    public NativeArray<Entity> entities; // The request entities

    [NativeDisableContainerSafetyRestriction, ReadOnly]
    public ComponentDataFromEntity<AStarSearchParameters> allParameters;

    [NativeDisableContainerSafetyRestriction, WriteOnly]
    public ComponentDataFromEntity<AStarPath> allPaths;

    [NativeDisableContainerSafetyRestriction, WriteOnly]
    public ComponentDataFromEntity<Waiting> allWaiting;

    [NativeDisableContainerSafetyRestriction, WriteOnly]
    public BufferFromEntity<Int2BufferElement> allPathLists;

    [ReadOnly]
    public ReachabilityType reachability;

    private HeuristicCalculator heuristicCalculator;

    // This will be specified by client on whether it wants to include diagonal neighbors
    [ReadOnly]
    public NativeArray<int2> neighborOffsets;

    [ReadOnly]
    public GridWrapper gridWrapper;

    // Execute search per entity in entities
    public void Execute(int index) {
        Search search = new Search() {
            entity = this.entities[index],
            allParameters = this.allParameters,
            allPaths = this.allPaths,
            allPathLists = this.allPathLists,
            reachability = this.reachability,
            heuristicCalculator = this.heuristicCalculator,
            neighborOffsets = this.neighborOffsets,
            gridWrapper = this.gridWrapper
        };
        search.Execute();

        // Update waiting
        this.allWaiting[this.entities[index]] = new Waiting() {
            done = true
        };
    }

    private struct Search {
        public Entity entity;
        public ComponentDataFromEntity<AStarSearchParameters> allParameters;
        public ComponentDataFromEntity<AStarPath> allPaths;
        public BufferFromEntity<Int2BufferElement> allPathLists;
        public ReachabilityType reachability;
        public HeuristicCalculator heuristicCalculator;
        public NativeArray<int2> neighborOffsets;
        public GridWrapper gridWrapper;
        private NativeList<AStarNode> allNodes;
        private OpenSet openSet;
        private NativeHashMap<int2, byte> closeSet;

        private int2 goalPosition;

        public void Execute() {
            // Instantiate containers
            this.allNodes = new NativeList<AStarNode>(10, Allocator.Temp);
            GrowingHeap heap = new GrowingHeap(new NativeList<HeapNode>(10, Allocator.Temp));
            this.openSet = new OpenSet(heap, new NativeHashMap<int2, AStarNode>(10, Allocator.Temp));
            this.closeSet = new NativeHashMap<int2, byte>(10, Allocator.Temp);

            AStarSearchParameters parameters = this.allParameters[this.entity];
            this.goalPosition = parameters.goal;

            float startNodeH = this.heuristicCalculator.ComputeCost(parameters.start, this.goalPosition);
            AStarNode startNode = CreateNode(parameters.start, -1, 0, startNodeH);
            this.openSet.Push(startNode);

            float minH = float.MaxValue;
            Maybe<AStarNode> minHPosition = Maybe<AStarNode>.Nothing;

            // Process while there are nodes in the open set
            while (this.openSet.HasItems) {
                AStarNode current = this.openSet.Pop();
                if (current.position.Equals(this.goalPosition)) {
                    // Goal has been found
                    int pathCount = ConstructPath(current);
                    this.allPaths[this.entity] = new AStarPath(pathCount, true);

                    return;
                }

                ProcessNode(current);

                this.closeSet.TryAdd(current.position, 0);

                // We save the node with the least H so we could still try to locate
                // the nearest position to the destination 
                if (current.H < minH) {
                    minHPosition = new Maybe<AStarNode>(current);
                    minH = current.H;
                }
            }

            // Open set has been exhausted. Path is unreachable.
            if (minHPosition.HasValue) {
                int pathCount = ConstructPath(minHPosition.Value);
                this.allPaths[this.entity] = new AStarPath(pathCount, false); // false for unreachable
            } else {
                this.allPaths[this.entity] = new AStarPath(0, false);
            }
        }

        private AStarNode CreateNode(int2 position, int parent, float g, float h) {
            int index = this.allNodes.Length;
            AStarNode node = new AStarNode(index, position, parent, g, h);
            this.allNodes.Add(node);

            return node;
        }

        private void ProcessNode(in AStarNode current) {
            if (IsInCloseSet(current.position)) {
                // Already in closed set. We no longer process because the same node with lower F
                // might have already been processed before. Note that we don't fix the heap. We just
                // keep on pushing nodes with lower scores.
                return;
            }

            // Process neighbors
            for (int i = 0; i < this.neighborOffsets.Length; ++i) {
                int2 neighborPosition = current.position + this.neighborOffsets[i];

                if (current.position.Equals(neighborPosition)) {
                    // No need to process if they are equal
                    continue;
                }

                if (!this.gridWrapper.IsInside(neighborPosition)) {
                    // No longer inside the map
                    continue;
                }

                if (IsInCloseSet(neighborPosition)) {
                    // Already in close set
                    continue;
                }

                if (!this.reachability.IsReachable(current.position, neighborPosition)) {
                    // Not reachable based from specified reachability
                    continue;
                }

                float tentativeG = current.G + this.reachability.GetWeight(current.position, neighborPosition);

                float h = this.heuristicCalculator.ComputeCost(neighborPosition, this.goalPosition);

                if (this.openSet.TryGet(neighborPosition, out AStarNode existingNode)) {
                    // This means that the node is already in the open set
                    // We update the node if the current movement is better than the one in the open set
                    if (tentativeG < existingNode.G) {
                        // Found a better path. Replace the values.
                        // Note that creation automatically replaces the node at that position
                        AStarNode betterNode = CreateNode(neighborPosition, current.index, tentativeG, h);

                        // Only add to open set if it's a better movement
                        // If we just push without checking, a node with the same g score will be pushed
                        // which causes infinite loop as every node will be pushed
                        this.openSet.Push(betterNode);
                    }
                } else {
                    AStarNode neighborNode = CreateNode(neighborPosition, current.index, tentativeG, h);
                    this.openSet.Push(neighborNode);
                }
            }
        }

        // Returns the position count in the path
        private int ConstructPath(AStarNode destination) {
            // Note here that we no longer need to reverse the ordering of the path
            // We just add them as reversed in AStarPath
            // AStarPath then knows how to handle this
            DynamicBuffer<Int2BufferElement> pathList = this.allPathLists[this.entity];
            pathList.Clear();
            AStarNode current = this.allNodes[destination.index];
            while (current.parent >= 0) {
                pathList.Add(new Int2BufferElement(current.position));
                current = this.allNodes[current.parent];
            }

            return pathList.Length;
        }

        public bool IsInCloseSet(int2 position) {
            return this.closeSet.TryGetValue(position, out _);
        }
    }
}

Job System

Using AStarSearchParallelFor should be trivial to use in a job system, or so I thought:

private JobHandle Process(in ArchetypeChunk chunk, in SimpleReachability reachability,
    JobHandle inputDeps) {
    AStarSearchParallelFor<SimpleHeuristicCalculator, SimpleReachability> searchJob =
        new AStarSearchParallelFor<SimpleHeuristicCalculator, SimpleReachability>() {
            entities = chunk.GetNativeArray(this.entityType),
            allParameters = this.allParameters,
            allPaths = this.allPaths,
            allWaiting = this.allWaiting,
            allPathLists = this.allPathLists,
            reachability = reachability,
            neighborOffsets = this.neighborOffsets,
            gridWrapper = this.gridSystem.GridWrapper
        };

    return searchJob.Schedule(chunk.Count, 64, inputDeps);
}

I made some request entities with random start and goal position so that the system that schedules the job would run. I tried 1 request and I saw that it works. Tried 2. It worked but not in parallel. So I increased it 10. Still not in parallel. Tried 20, not in parallel. Then 50. Still not in parallel. I’m quite disappointed.

So I hanged out on the forums as a break. Luckily, I found a thread where a programmer is trying to do something like I do. He wanted to run parallel jobs for each entity because his processing is heavy. Somebody replied that he ought to use IJobParallelFor with a low innerloopBatchCount, the second parameter in Schedule() (64 in my code). At that moment, I just realized what that parameter is for. (I didn’t know. I just follow example code. Don’t judge me.)

Passing in 64 means that the execution will only try to distribute to other threads when it reaches 64 executes. It’s like a divisor to determine how many batches of execution to create. Say chunk.Count is equal to 512. This means that 8 execution batches would be created because 500 / 64 = 8. These 8 executions would then be distributed to the worker threads in parallel. At least that’s how I understood it.

In my case, there would always be only one execution batch. My request count never reached 64. I should pick a reasonable number such that there’s a higher chance that execution batches would be distributed evenly (not all A* requests are computed equally). Setting it to 1 means that there would one execution batch per request. But this is not without cost, though, I would surmise. There should be a cost associated with creating an execution batch and assigning a thread to it. So I picked 2 instead. It’s good enough that batches would be distributed evenly while not too many to pressure the scheduler.

private JobHandle Process(in ArchetypeChunk chunk, in SimpleReachability reachability,
    JobHandle inputDeps) {
    AStarSearchParallelFor<SimpleHeuristicCalculator, SimpleReachability> searchJob =
        new AStarSearchParallelFor<SimpleHeuristicCalculator, SimpleReachability>() {
            entities = chunk.GetNativeArray(this.entityType),
            allParameters = this.allParameters,
            allPaths = this.allPaths,
            allWaiting = this.allWaiting,
            allPathLists = this.allPathLists,
            reachability = reachability,
            neighborOffsets = this.neighborOffsets,
            gridWrapper = this.gridSystem.GridWrapper
        };

    // Just changed 64 to 2
    return searchJob.Schedule(chunk.Count, 2, inputDeps);
}

Once I did this, I could now see that the A* search executions are run in parallel:

Sweet parallel execution. This is 60 requests.

Benchmark

I did the same benchmark as my last article. I made a test environment where I could increase/decrease the number of requests until I hit more than 16ms. The last record was 160. This time I could make 300 requests before hitting past 16ms.

Threads are now almost filled. Click here for larger image.

Note that this is still using a generic A*. This could be further optimized by using A* algorithm optimizations like Jump Point Search or HPA*. For our use case, however, we probably won’t need to. 20 requests per frame would already be good for us.

Next Steps

I’m slowly working to apply this to our game, Academia. I’m almost there. This will be another part of the game that will be optimized using Unity’s DOTS. But that will be for another post.

How fast could Burst compiled A* be?

I’ve been excited with Unity’s Burst compiler since I saw it in action with my own code. Little by little, I’ve been porting some of our common code to use ECS so we could easily leverage Burst. I’ve always wanted to write the most basic and generic A* algorithm written in ECS and see how fast could it be when compiled with Burst. Because if it’s already fast enough, there would be no need to write more sophisticated A* variants like Jump Point Search. Around last month, I was able to do this and I was very pleased with the results.

I made a simple environment where I could run A* over. I used a 80×40 2D grid. I added some controls where I could paint cells that are blockers. I used 80×40 because it’s the one where this map of rooms was made. I could have made a procedural room generator but I just wanted to test my A* code right away. (Maybe next time.) I also didn’t want to waste time making a scrollable or zoomable map. After all, I just want to see how faster Burst compiled A* is.

Before performance testing, I had to verify first that my ECS A* code works. As you can see in the image above, it shows the resolved path from the blue tile to the red tile. I even added handling for destination that can’t be reached in which the algorithm would still try to resolve the nearest position to the target.

Red tile is unreachable

Code

Here’s the A* search code as an IJob:

[BurstCompile]
public struct AStarSearch<HeuristicCalculator, ReachabilityType> : IJob
    where HeuristicCalculator : struct, HeuristicCostCalculator where ReachabilityType : struct, Reachability {
    public Entity owner;
    public int2 startPosition;
    public int2 goalPosition;

    [ReadOnly]
    public ReachabilityType reachability;

    // This will be specified by client on whether it wants to include diagonal neighbors
    [ReadOnly]
    public NativeArray<int2> neighborOffsets;

    public GridWrapper gridWrapper;

    public ComponentDataFromEntity<AStarPath> allPaths;

    [ReadOnly]
    public BufferFromEntity<Int2BufferElement> allPathLists;

    public ComponentDataFromEntity<Waiting> allWaiting;

    // This is the master container for all AStarNodes. The key is the hash code of the position.
    // This will be specified by client code
    public NativeList<AStarNode> allNodes;

    // Only used for existence of position in closed set
    // This will be specified by client code
    public NativeHashMap<int2, byte> closeSet;

    private HeuristicCalculator heuristicCalculator;

    public OpenSet openSet;

    public void Execute() {
        this.allNodes.Clear();
        this.openSet.Clear();
        this.closeSet.Clear();

        DoSearch();

        // Mark as done waiting for the agent to respond
        this.allWaiting[this.owner] = new Waiting {
            done = true
        };
    }

    private void DoSearch() {
        if (!this.reachability.IsReachable(this.goalPosition)) {
            // Goal is not reachable
            this.allPaths[this.owner] = new AStarPath(0, false);

            return;
        }

        float startNodeH = this.heuristicCalculator.ComputeCost(this.startPosition, this.goalPosition);
        AStarNode startNode = CreateNode(this.startPosition, -1, 0, startNodeH);
        this.openSet.Push(startNode);

        float minH = float.MaxValue;
        Maybe<AStarNode> minHPosition = Maybe<AStarNode>.Nothing;

        // Process while there are nodes in the open set
        while (this.openSet.HasItems) {
            AStarNode current = this.openSet.Pop();
            if (current.position.Equals(this.goalPosition)) {
                // Goal has been found
                int pathCount = ConstructPath(current);
                this.allPaths[this.owner] = new AStarPath(pathCount, true);

                return;
            }

            ProcessNode(current);

            this.closeSet.TryAdd(current.position, 0);

            // We save the node with the least H so we could still try to locate
            // the nearest position to the destination 
            if (current.H < minH) {
                minHPosition = new Maybe<AStarNode>(current);
                minH = current.H;
            }
        }

        // Open set has been exhausted. Path is unreachable.
        if (minHPosition.HasValue) {
            int pathCount = ConstructPath(minHPosition.Value);
            this.allPaths[this.owner] = new AStarPath(pathCount, false); // false for unreachable
        } else {
            this.allPaths[this.owner] = new AStarPath(0, false);
        }
    }

    private AStarNode GetNode(int index) {
        return this.allNodes[index];
    }

    private AStarNode CreateNode(int2 position, int parent, float g, float h) {
        int index = this.allNodes.Length;
        AStarNode node = new AStarNode(index, position, parent, g, h);
        this.allNodes.Add(node);

        return node;
    }

    // Returns the position count in the path
    private int ConstructPath(AStarNode destination) {
        // Note here that we no longer need to reverse the ordering of the path
        // We just add them as reversed in AStarPath
        // AStarPath then knows how to handle this
        DynamicBuffer<Int2BufferElement> pathList = this.allPathLists[this.owner];
        pathList.Clear();
        AStarNode current = GetNode(destination.index);
        while (current.parent >= 0) {
            pathList.Add(new Int2BufferElement(current.position));
            current = GetNode(current.parent);
        }

        return pathList.Length;
    }

    private void ProcessNode(in AStarNode current) {
        if (IsInCloseSet(current.position)) {
            // Already in closed set. We no longer process because the same node with lower F
            // might have already been processed before. Note that we don't fix the heap. We just
            // keep on pushing nodes with lower scores.
            return;
        }

        // Process neighbors
        for (int i = 0; i < this.neighborOffsets.Length; ++i) {
            int2 neighborPosition = current.position + this.neighborOffsets[i];

            if (current.position.Equals(neighborPosition)) {
                // No need to process if they are equal
                continue;
            }

            if (!this.gridWrapper.IsInside(neighborPosition)) {
                // No longer inside the map
                continue;
            }

            if (IsInCloseSet(neighborPosition)) {
                // Already in close set
                continue;
            }

            if (!this.reachability.IsReachable(current.position, neighborPosition)) {
                // Not reachable based from specified reachability
                continue;
            }

            float tentativeG = current.G + this.reachability.GetWeight(current.position, neighborPosition);

            float h = this.heuristicCalculator.ComputeCost(neighborPosition, this.goalPosition);

            if (this.openSet.TryGet(neighborPosition, out AStarNode existingNode)) {
                // This means that the node is already in the open set
                // We update the node if the current movement is better than the one in the open set
                if (tentativeG < existingNode.G) {
                    // Found a better path. Replace the values.
                    // Note that creation automatically replaces the node at that position
                    AStarNode betterNode = CreateNode(neighborPosition, current.index, tentativeG, h);

                    // Only add to open set if it's a better movement
                    // If we just push without checking, a node with the same g score will be pushed
                    // which causes infinite loop as every node will be pushed
                    this.openSet.Push(betterNode);
                }
            } else {
                AStarNode neighborNode = CreateNode(neighborPosition, current.index, tentativeG, h);
                this.openSet.Push(neighborNode);
            }
        }
    }

    private bool IsInCloseSet(int2 position) {
        return this.closeSet.TryGetValue(position, out _);
    }
}

The grid is represented as a linear array of int2 (2 dimensional vector but with integers). This is handled by the passed GridWrapper upon job creation.

Neighbors or adjacent nodes is easily represented as an array of int2 offsets. This is the neighborOffsets variable in the code. To get the neighbors, the code just loops through this array and then add the currently processed position and the current neighbor offset. The resulting int2 would be the position of the neighbor.

The whole thing is just the baseline A* algorithm that everybody learns on the first time.

Performance Test

The performance test is simple. Every frame, I make an X amount of A* requests and the code executes those X requests within the same frame. A single request means picking a random start and random end position from the map in the image above. I picked this kind of map representation because this is how our current game maps approximately looks like.

The machine used for the test has the following specs:

  • Intel i3-6100 @ 3.70Ghz (4CPUs)
  • 16GB RAM

I run two tests. First is for non-Burst compiled, then a Burst compiled one. The profilers are run with a development build of the simulation.

Non Burst

The following is how the non-Burst compiled simulation ran with a single request (X = 1):

Non Burst with single request (larger image)

A single A* search runs around 1ms-8ms.

I kept increasing the number of requests until simulation hits past 16ms (60fps target). This target is already hit at 15 requests:

Past 16ms is reached with just 15 requests (larger image)

Burst

Let’s see the performance of the Burst compiled simulation with just one request:

Burst with single request

The running time goes as low as 0.04ms and as high as 0.70ms. I haven’t seen a running time that reached at least 1ms.

I increased the number of requests until the frame time reaches past 16ms. Guess how many requests it takes.

Burst compiled A* can run up to 120 requests until it reaches 16ms (larger image)

I reached 120 requests before hitting that 16ms mark. That’s 8 times more requests than non-Burst A* can handle! For me, that’s already kind of amazing. That’s more than enough for the kind of games we make (simulation games). We don’t need 120 requests per frame. Even just 20 per frame would be nice.

Final Words

While the simulation here is not really indicative of that of an actual game, it still showed that Burst compiled A* is significantly faster. I wouldn’t be able to use this on our current game because of heavy OOP usage but I’m excited for our next game.

Run managed code in Unity’s Job System

I’ve always thought that only the HPC# (High Performance C#) subset can be executed in jobs which are run in Unity’s reserved worker threads. We do some multithreading in our game but we’re running them in C#’s managed threads as they’re not using HPC#. I feel like I’m wasting the worker threads if they’re not fully utilized. While we have ECS systems running in worker threads, they’re using very little running time.

ThreadNotUtilized
Look at the bottom 3 timelines. Those are the worker threads. Notice the huge Idle time. What a waste!

As much as I want to utilize the worker threads, it’s very hard to port a huge existing OOP code to HPC# so that I can use the job system. I would have to rewrite the game from scratch which is stupid. That is until I read a thread from the Unity forums on how you can run managed C# code in jobs. I tried it on a simple test and I’m surprised how easy it is.

Simple Example

Here’s an example to run managed code in a job:

// Common interface for executing something
interface Task {
    void Execute();
}

// Just an arbitrary heavy computation as an example
class SampleHeavyTask : Task {
    private Unity.Mathematics.Random random;

    public SampleHeavyTask() {
        this.random = new Unity.Mathematics.Random(1234);
    }

    public void Execute() {
        float total = 0;
        for (int i = 0; i < 50000; ++i) {
            float randomValue = this.random.NextFloat(1, 100);
            total += math.sqrt(randomValue);
        }
    }
}

public class ManagedCodeInJob : MonoBehaviour {
    private readonly List<GCHandle> gcHandles = new List<GCHandle>();
    private readonly List<JobHandle> jobHandles = new List<JobHandle>();

    private void Update() {
        // Schedule one task per frame
        ScheduleTask();
    }

    private void ScheduleTask() {
        Task task = new SampleHeavyTask();
        GCHandle gcHandle = GCHandle.Alloc(task);
        this.gcHandles.Add(gcHandle); // We remember this so we can free it later

        Job job = new Job() {
            handle = gcHandle
        };

        // We remember the JobHandle so we can complete it later
        this.jobHandles.Add(job.Schedule());
    }

    private void LateUpdate() {
        // Free and complete the scheduled jobs
        for (int i = 0; i < this.jobHandles.Count; ++i) {
            this.jobHandles[i].Complete();
            this.gcHandles[i].Free();
        }

        this.jobHandles.Clear();
        this.gcHandles.Clear();
    }

    private struct Job : IJob {
        public GCHandle handle;

        public void Execute() {
            Task task = (Task) handle.Target;
            task.Execute();
        }
    }
}

The key here is the Job struct code found at the bottom. You provide it with a GCHandle that targets an instance that provides the execution. You can just cast it then call the method that executes your task.

In Update(), we schedule one job which tells Unity "Hey, please run this in a worker thread if you can" (sometimes Unity can't). Then in LateUpdate(), we force that job to complete which means that main thread will wait for the task execution that's in the worker thread to be completed. We also call GCHandle.Free().

I admit that I didn't know what GCHandle does until the writing of this post. I'll explain as best as I can. GCHandle.Alloc() protects a managed object from being garbage collected. I would surmise that the worker threads are unmanaged/native environments which is outside the jurisdiction of the garbage collector. We have to make sure that the garbage collector does not collect the objects while the worker threads are still executing them. GCHandle.Alloc() kind of removes an object from the garbage collector's radar. After the thread execution completes, we have to call GCHandle.Free() to mark the object as garbage collectable again.

If you attach the script above to a GameObject and run the scene, you will see something like this in the profiler:

SingleManagedCode

You can see that the job that executes the sample heavy task is run in one of the worker threads.

My PC has 4 cores, so Unity automatically creates 3 worker threads (coreCount – 1). The remaining one will be the main thread. Let’s look at what happens when we call ScheduleTask() in Update() three times:

private void Update() {
    // Of course you can use a for loop here.
    ScheduleTask();
    ScheduleTask();
    ScheduleTask();
}

This is what the profiler looks like in my machine:

ThreeManagedCode
Three jobs are executed in parallel in worker threads.

We can see here that the three worker threads are each executing the heavy job in parallel. In the main thread, you can see that LateUpdate() only waits for 5.57ms instead of waiting the 16.45ms total execution time of the three worker threads. That’s multithreading at work.

Wait for execution to finish

The current example forces the threads to finish before the frame ends. This is done by calling JobHandle.Complete() in LateUpdate(). However, there may be times where you want to wait for the thread execution to finish instead of forcing it to complete. This implies that the thread may execute the job in multiple frames. This is useful for tasks that may run more than 16ms per execution.

To do this, we can simply change LateUpdate() to the following:

private void LateUpdate() {
    // Wait for jobs to complete before freeing and removing them
    // Iterate from the end so we can easily remove from lists
    for (int i = this.jobHandles.Count - 1; i >= 0; --i) {
        if (this.jobHandles[i].IsCompleted) {
            this.jobHandles[i].Complete();
            this.jobHandles.RemoveAt(i);
            
            this.gcHandles[i].Free();
            this.gcHandles.RemoveAt(i);
        }
    }
}

Be careful!

As easy as this is, this is not without drawbacks. The normal job system using HPC# would usually tell you what you’re doing wrong in terms of doing multithreading code such as race conditions.

In using managed code, you’re on your own. Unity won’t tell you anything so you have to make sure that your multithreaded code is correct. Multithreading is inherently hard so be careful. Use this only if you know what you’re doing.

That’s all for now. Hope it helps!

A small optimization for A*

Right now, my side project is to make an A* code that works for Unity’s ECS (this is my main project). I’ve always wondered how many A* calls can be completed in a single frame using a Burst compiled implementation that runs in a job (multithreaded). This post is not about the results as I’m not there yet. My current implementation is still horrible. I did, however, observe a little optimization that I can make while rewriting our in house A* solution.

No need to fix the Open Set

Again, I’m not going to explain the whole A* algorithm here as there are better resources for that else where. I’ll just assume that you’re familiar with it.

A* employs the concept of an open set which is just a collection of positions or nodes that are possible path but haven’t been considered yet. Every node in the open set has an F score associated with it which represents how far the node is from the goal position. Every time we consider a node in the open set, we get the one with the least F score. That’s because a lesser F score means it’s closer to the goal.

Most A* implementations uses a heap data structure for the open set. A heap is a kind of container that remembers what item in it has the least or most value (depending on the comparison function that you pass to it). For A*, we need the node with the least F score. Using a heap is just a perfect fit for the purposes of the algorithm. The heap may be an array or a tree based implementation.

In A*, there’s a special case that says when a neighbor node is already in the open set, we check if it has a lesser F score than the one contained in the set. If it is, we update the F score of the one in the open set, instead of pushing the neighbor node into the set. When a node’s F score is updated, the internal state of the heap is now disrupted. The way the heap maintains the ordering of its items could now be wrong. We have to fix the heap such that it’s internal integrity is still intact. If it’s an array based implementation, we have to move the position of the node so that it is in its right place. It’s the same with the tree based implementation. The node may need to bubble up since it now has a lesser value.

Then I had a thought. What if I can avoid fixing the heap? What if I’ll just push the neighbor with better F score into the heap? The heap will fix itself anyway. The duplicate node with the least F score will always be popped first. When the old node with higher F score is popped, I can just check if it’s already in the close set which would be the case since the one with lesser F score would have been already popped and processed. The old node will always be skipped.

Not fixing the heap would also lead to a better implementation. We can now use immutable structs for the heap entries and A* nodes as we no longer need to mutate them.

So I did all this and the algorithm still checks out. I admit that this is a micro-optimization and the gains are very small, but hey, fixing the heap is still code that you don’t have to execute.

private void ProcessNode(in AStarNode current) {
    if (IsInCloseSet(current.position)) {
        // Already in closed set. We no longer process because the same node with lower F
        // might have already been processed before. Note that we don't fix the heap. We just
        // keep on pushing nodes with lower scores.
        return;
    }

    // Process neighbors
    for (int i = 0; i < this.neighborOffsets.Length; ++i) {
        int2 neighborPosition = current.position + this.neighborOffsets[i];
        
        if (current.position.Equals(neighborPosition)) {
            // No need to process if they are equal
            continue;
        }

        if (!this.gridWrapper.IsInside(neighborPosition)) {
            // No longer inside the map
            continue;
        }
    
        if (IsInCloseSet(neighborPosition)) {
            // Already in close set
            continue;
        }

        if (!this.reachability.IsReachable(current.position, neighborPosition)) {
            // Not reachable based from specified reachability
            continue;
        }
        
        float tentativeG = current.G + this.reachability.GetWeight(current.position, neighborPosition);
        float h = this.heuristicCalculator.ComputeCost(neighborPosition, this.goalPosition.Value);

        AStarNode neighborNode = GetNode(neighborPosition);
        if (neighborNode.hasValue) {
            // This means that the node is already in the open set
            // We update the node if the current movement is better than the one in the open set
            if (tentativeG < neighborNode.G) {
                // Found a better path.
                neighborNode = CreateNode(neighborPosition, tentativeG, h, new Maybe<int2>(current.position));
				
				// Just push the node with lesser F. No need to fix the open set.
                this.openSet.Push(new HeapNode(neighborPosition, neighborNode.F));
            }
        } else {
		    // Not yet in open set
            neighborNode = CreateNode(neighborPosition, tentativeG, h, new Maybe<int2>(current.position));
            this.openSet.Push(new HeapNode(neighborPosition, neighborNode.F));
        }
    }
}

I avoided Unity’s Camera.Render(). Our game runs faster.

One of my pet peeves with our game is that it’s a 2D game but it’s rendering is horrendous. Say a game scene that looks like this:

perfscene

This doesn’t cover the game’s whole tilemap yet but it does already have lots of objects in it. When I run this through the profiler, this is what it looks like:

slowperf
Click here for enlarged image.

The timeline for Camera.Render hovers around 20-30ms. That’s a lot! Remember that you need to run a frame at 16ms at most to achieve 60fps. The rendering alone takes more than that. I don’t have the CPU budget anymore for game logic.

It’s not that I didn’t try to make the effort to optimize rendering and encourage batching. I did a lot. See here, here, here, and here. The game does perform well when the player zooms into the map, which means the camera sees less objects and majority are culled. However, for a simulation game like Academia, players would usually play zoomed out because they want to see how the school is doing. Thus, Camera.Render usually processes more objects and performs slower.

I keep thinking about those 3D games that renders meshes with hundreds to thousands of polygons and still run at 60fps. My hypothesis is that Camera.Render takes too much time calculating for culling and batching quad meshes. That got me into thinking that maybe I don’t need culling for this game. Modern GPUs should be able to chew thousands of polygons easily. I already have a tool that collects sprites into a single mesh and render that one mesh instead. There should be no problem if I just dump these monolithic meshes to the GPU for rendering.

So I made a separate project to test if my hypothesis is correct. I improved my custom sprite manager to handle static sprites more efficiently. I simulated the hypothetical maximum number of sprites in our game. I prepared 5 layers for static sprites that has 16,380 sprites per layer. That’s a total of 81,900 static sprites. I also added one layer for non static sprites that has 4,500 units. My custom renderer can render all these in 7-10ms.

testperf
The highlighted part (9.368ms) are my custom rendering systems. Click here for enlarged image.

Note that this is already the hypothetical maximum. The game scene I showed isn’t even half the maximum. My custom renderer should do much better than Camera.Render given the same scene.

I decided to integrate my custom renderer to our game. This required a lot of refactoring. I started the integration around late September 2018. I’m not finished yet. I haven’t completely applied it to all objects in the game but I did already cover the majority, especially those that are frequently used. I did another profiling of the game while loading the same scene and this is how it looks:

burstperf
The highlighted section (1.725ms) on the left side is where my custom renderer runs. Note that this also includes game specific systems (ECS). Click here for enlarged image.

You can see here that my custom renderer runs at 1.725ms. On the right, you can also see that I’ve reduced the running time of Camera.Render to 2.01ms. Most objects don’t use the standard Unity renderer like SpriteRenderer and MeshRenderer anymore so they’re no longer “processed” by the camera. Adding the running time of my custom renderer and Camera.Render, the total rendering time now hovers around 3-5ms. That’s a huge difference from the original 20-30ms.

Notes:

  • All profiling are done using a development build (not from editor)
  • My custom renderer uses Unity’s ECS with Burst compiler turned on

Let A* fail but still return a path

I wrote an article before about avoiding pathfinding if you can determine beforehand that the destination is unreachable. This can be done by labeling your tiles or nodes with same integer value if they are connected. Before querying for a path, you can identify if a destination is reachable by checking if its label is the same as the starting position.

ConnectionLabeling
2 can only reach 2. The same is true for 1, 3, 4, and 5. O are blocked tiles.

This method has helped our game tremendously. It prevented longer A* query wait time when agents try to go to an unreachable path. However, we still haven’t fixed the UX issue in which the agent indicates that it can’t reach its destination but the player wouldn’t know which tile is it? This issue is always reported to us a bug but upon investigation, the problem is just an unreachable path that is not very clear where it is or why.

The ideal fix is for the agent to still try to reach its unreachable destination then show the indicator that it can’t reach its target. This way, the player can at least deduce why the agents are showing such indicator in only specific tiles.

The programming problem then is how to determine a path in which the last position is the closest to the agent’s destination. The solution is actually very simple that I feel so stupid why I haven’t thought of it before. It can be implemented inside the A* algorithm itself.

Least H

I’m going to assume that you’re familiar with A*. Remember that in A*, we store the H value in each node which is the heuristic cost or distance of such node to the destination. During processing of nodes in the open set, we can store the unblocked node that has the least H value, which means the closest. When there are no more nodes in the open set (which implies that the path is unreachable), we can construct a path using the node with least H as destination instead of ending the algorithm without a path. Note that you can always construct a path from any node in the open set given that you set a parent node to each of them.

Here’s the snippet of such code in our pathfinding framework:

float minH = float.MaxValue;
AStarNode<T> minHNode = null;

this.openSet.Add(startNode);

while (!this.openSet.IsEmpty()) {
    AStarNode<T> current = this.openSet.GetNodeWithLeastF(); // returns the AStar node with the lowest F value
    if (IsGoal(current)) {
	    // Path to destination is found
        ConstructPath(current, this.executionPath);
        this.path.MarkAsReachable();
        return;
    }

    this.openSet.RemoveNodeWithLeastF();
    ProcessNode(current, this.goal, this.calculator, this.reachability);

    this.closeSet.Add(current.GraphNode);
    
    // We save the node with the least H so we could still try to locate
    // the nearest position to the destination 
    if (current.H < minH) {
        minHNode = current;
        minH = current.H;
    }
}

// It's unreachable
path.Reachable = false;

// We create a path from the node with the least F so that agent can still
// have a path towards the destination albeit it's not reachable
if (minHNode != null) {
    ConstructPath(minHNode, path);
}

Jump Point Search

The method I described might still take longer to execute in a generic A* algorithm. The generic algorithm would need to process all nodes in the node space after all. Fortunately, we’re using the Jump Point Search (JPS) variant. In JPS, identifying an unreachable node is fairly fast. Not all nodes would be pushed to the open set.

In our code, we added an additional case for a node to be considered as a jump point (aka candidate for open set). It’s a very simple case. Any node that has all the following properties is a jump point:

    • The node has the same x or y position to the destination node.
    • The adjacent node going to the direction of the destination is a blocker.

For example in the following image, the circled tiles are jump points (say the chair is the destination).

JumpPoints

We needed these as jump points because we want nodes in the open set that would be approximately closest to the destination. Without these, the nodes in the open set will be the ones identified by the original JPS algorithm which are not always the closest positions.

The jump point consideration logic we used wouldn’t always identify the closest position but the approximation is good enough.

InAction

Unity ECS: Basic Chunk Iteration

Unity’s ECS API is still in preview, and thus it’s in flux. There are different ways to query data. Some of them might be deprecated even before the more stable API is released. Querying data by injection is the most used method as it is easy to understand, and requires less code. However, it’s not very flexible for complex component queries. The Unity devs in ECS forum have expressed that injection would be deprecated together with structures like ComponentDataArray. To prepare for this eventuality, I refactored my ECS code to not use injection and ComponentDataArray.

The baseline API for querying ECS data is called Chunk Iteration. All other APIs are using this one internally, including data injection. It’s main con is it’s very verbose. But if you’re tinkering with ECS and you want your code to be more future proof, this is the API to use. This post will show you the basics of chunk iteration.

Inject Version

Let’s say we’re writing a system that moves a position by some velocity. Unity already has a Position component, so we only need a Velocity component:

public struct Velocity : IComponentData {
    public float3 value;
}

This is the system using injection:

public class MoveByVelocitySystem : ComponentSystem {
    // Data
    private struct Data {
        public readonly int Length;
        public ComponentDataArray<Position> Positions;
        public ComponentDataArray<Velocity> Velocities;
    }

    [Inject]
    private Data data;

    protected override void OnUpdate() {
        for (int i = 0; i < this.data.Length; ++i) {
            this.data.Positions[i] = new Position() {
                Value = this.data.Positions[i].Value + this.data.Velocities[i].value * Time.deltaTime
            };
        }
    }
}v

This is very straightforward. We're getting the entities with Position and Velocity components. We update the Position by the Velocity value multiplied by delta time.

Chunk Iteration Version

Let’s rewrite MoveByVelocitySystem using chunk iteration. Here are the main steps:

  • Prepare a ComponentGroup
  • On each update:
    • Get an ArchetypeChunkComponentType for each component that you want to iterate
    • Get the array of ArchetypeChunk and process each chunk
    • Dispose the array of ArchetypeChunk

Here’s how it looks like:

public class MoveByVelocitySystem : ComponentSystem {
    private ComponentGroup group;

    private ArchetypeChunkComponentType<Position> positionType;
    private ArchetypeChunkComponentType<Velocity> velocityType;

    protected override void OnCreateManager() {
        // Prepare group
        this.group = GetComponentGroup(typeof(Position), typeof(Velocity));
    }

    protected override void OnUpdate() {
        // Get each ArchetypeChunkComponentType
        this.positionType = GetArchetypeChunkComponentType();
        this.velocityType = GetArchetypeChunkComponentType();

        // Get the chunks and process each
        NativeArray<ArchetypeChunk> chunks = this.group.CreateArchetypeChunkArray(Allocator.TempJob);
        for (int i = 0; i < chunks.Length; ++i) {
            Process(chunks[i]);
        }
        
        // Don't forget to dispose the chunks
        chunks.Dispose();
    }

    private void Process(ArchetypeChunk chunk) {
        // Use the ArchetypeChunkComponentType to get an array of each component
        NativeArray<Position> positions = chunk.GetNativeArray(this.positionType);
        NativeArray<Velocity> velocities = chunk.GetNativeArray(this.velocityType);

        // Update the components here
        for (int i = 0; i < chunk.Count; ++i) {
            positions[i] = new Position() {
                Value = positions[i].Value + velocities[i].value * Time.deltaTime
            };
        }
    }
}

As you can see, there's a lot more code here. You can imagine that this is what's happening under the hood when you're using injection and other data query APIs.

Chunk Iteration In Jobs

Chunk iteration wouldn’t be useful if it can’t be used inside jobs. Here’s MoveByVelocitySystem but using an IJobParallelFor:

public class MoveByVelocitySystem : JobComponentSystem {
    private ComponentGroup group;
    
    protected override void OnCreateManager() {
        this.group = GetComponentGroup(typeof(Position), typeof(Velocity));
    }
    
    protected override JobHandle OnUpdate(JobHandle inputDeps) {
        NativeArray<ArchetypeChunk> chunks = this.group.CreateArchetypeChunkArray(Allocator.TempJob);
        
        Job job = new Job() {
            deltaTime = Time.deltaTime,
            positionType = GetArchetypeChunkComponentType(),
            velocityType = GetArchetypeChunkComponentType(),
            chunks = chunks
        };
        
        return job.Schedule(chunks.Length, 64, inputDeps);
    }

    private struct Job : IJobParallelFor {
        public float deltaTime;
        
        public ArchetypeChunkComponentType<Position> positionType;
        public ArchetypeChunkComponentType<Velocity> velocityType;

        [ReadOnly]
        [DeallocateOnJobCompletion]
        public NativeArray<ArchetypeChunk> chunks;
        
        public void Execute(int index) {
            Process(this.chunks[index]);
        }
        
        private void Process(ArchetypeChunk chunk) {
            NativeArray<Position> positions = chunk.GetNativeArray(this.positionType);
            NativeArray<Velocity> velocities = chunk.GetNativeArray(this.velocityType);

            for (int i = 0; i < chunk.Count; ++i) {
                positions[i] = new Position() {
                    Value = positions[i].Value + velocities[i].value * this.deltaTime
                };
            }
        }
    }
}

The attribute [DeallocateOnJobCompletion] is needed for the array of chunks since you won't have a chance to dispose it when you schedule the job. You don't dispose the array right after scheduling because the array will already be lost when the job is executed.

That's it for now. Happy coding!

Some ECS Utility Scripts

I’ve been practicing Unity’s ECS for a while now. Even made my own 2D rendering system with it. I have made some utility scripts along my trial and error studies. Here are few useful ones.

Fast copy from NativeArray to managed array

Unity’s ECS is still in its infancy. We’re not getting rid of the old way yet. One of the old ways is dealing with the Mesh class. This class has the interface for setting the vertices, normals, uv, colors, etc. However, these interfaces accepts only managed arrays. In ECS, you can only use the Native collection if you plan to use the Jobs system and Burst compiler. In my case, I keep a NativeArray and copy the contents to a managed array and use this managed array to set to the mesh. We don’t use a for loop here and copy each element. It’s slow especially when the array elements grow large. I use the following extension method instead.

public static unsafe void CopyToFast<T>(this NativeArray<T> nativeArray, T[] array) where T : struct {
    int byteLength = nativeArray.Length * UnsafeUtility.SizeOf(typeof(T));
    void* managedBuffer = UnsafeUtility.AddressOf(ref array[0]);
    void* nativeBuffer = nativeArray.GetUnsafePtr();
    UnsafeUtility.MemCpy(managedBuffer, nativeBuffer, byteLength);
}

// Usage
NativeArray<Vector3> nativeVertices = new NativeArray<Vector3>(vertexCount, Allocator.Persistent);
Vector3[] managedVertices = new Vector3[vertexCount];
nativeVertices.CopyToFast(managedVertices);

ByteBool

Unity’s ECS uses structs as components. One limitation is that the variables in it should be “blittable”. Blittable types are types that do not require conversion when they are passed between managed and unmanaged code. Under the hood, Unity’s ECS copies a lot of data around. They require blittable types to make this copying as fast as possible.

This means that you can’t have reference types like classes and arrays. Surprisingly, you also can’t use bool. Yep, bool is not a blittable type. But boolean values are so useful. You can get around this by using a byte (one or zero) as the boolean value and wrap it in a struct. Here’s my take on it:

public struct ByteBool : IEquatable {
    private byte value;

    public ByteBool(bool value) {
        this.value = (byte)(value ? 1 : 0);
    }

    public bool Value {
        get {
            return this.value != 0;
        }

        set {
            this.value = (byte)(value ? 1 : 0);
        }
    }

    public bool Equals(ByteBool other) {
        return this.value == other.value;
    }

    public override bool Equals(object obj) {
        if (ReferenceEquals(null, obj)) {
            return false;
        }

        return obj is ByteBool &amp;&amp; Equals((ByteBool) obj);
    }

    public override int GetHashCode() {
        return this.value.GetHashCode();
    }
    
    /// <summary>
    /// Converts a bool to a ByteBool
    /// </summary>
    /// 
    /// 
    public static implicit operator ByteBool(bool value) {
        return new ByteBool(value);
    }
    
    /// <summary>
    /// Converts a ByteBool to a bool
    /// </summary>
    /// 
    /// 
    public static implicit operator bool(ByteBool source) {
        return source.Value;
    }
}

You can use this struct like how you would use a bool variable.

struct MyComponent : IComponentData {
    public ByteBool isSomething;
}

MyComponent component = new MyComponent();
component.isSomething = true;

...

if(component.isSomething) {
    // Do something cool here
}

ComponentDataFromEntity for ISharedComponentData

ComponentDataFromEntity is like Dictionary<Entity, T> where T can only be IComponentData. There’s no such thing for ISharedComponentData. The solution I found is to collect them into a container inside a system. During my studies, I have found that I do this collection of ISharedComponentData instances a lot in different places. I thought maybe I could refactor this. Here’s what I made:

public abstract class CollectSharedComponentsSystem<T> : ComponentSystem where T : struct, ISharedComponentData {
    protected struct Collected : ISystemStateComponentData {
    }
    
    protected struct Data {
        public readonly int Length;

        [ReadOnly]
        public EntityArray Entity;
        
        [ReadOnly]
        public SharedComponentDataArray<T> Component;

        [ReadOnly]
        public SubtractiveComponent<Collected> Collected;
    }

    [Inject]
    protected Data data;
    
    private readonly Dictionary<Entity, T> map = new Dictionary<Entity, T>(1);
    
    protected override void OnUpdate() {
        for (int i = 0; i < this.data.Length; ++i) {
            Entity entity = this.data.Entity[i];
            this.map[entity] = this.data.Component[i];
            
            // Add this component so it will no longer be processed by this system
            this.PostUpdateCommands.AddComponent(entity, new Collected());
        }
    }
    
    public Maybe<T> Get(Entity entity) {
        if (this.map.TryGetValue(entity, out T value)) {
            return new Maybe<T>(value);
        }
        
        return Maybe<T>.Nothing;
    }
}

// Usage
// Say you have a MySharedComponent

// Create the concrete system
public class MySharedComponentCollectionSystem : CollectSharedComponentsSystem<MySharedComponent> {
}

// Inject in another system and use it
public class SomeSystem : ComponentSystem {
    [Inject]
    private MySharedComponentCollectionSystem mySharedComponents;

    protected override void OnUpdate() {
        for(...) {
            Entity someEntity = // get it somewhere
            Maybe<MySharedComponent> sharedComponent = this.mySharedComponents.Get(someEntity);
            if(sharedComponent.HasValue) {
            	// Use sharedComponent here
            }
        }
    }
}

Note here that I didn't handle removal because I have no use for it yet. You can also see the usage of Maybe<T>. This is a pattern that I have come to adopt lately. If you have a method which can return a value or nothing (say for non existence), return a Maybe<T> instead. This way, you're clear with your intention that the method may return nothing. The caller of the method can see this intention very well and can adjust their code. You can read more here.