Forcing to Handle Null

The concept of null is considered as the billion dollar mistake and I very much agree. The defects caused by it is even more apparent in games where state changes happen a lot. In a complex simulation game like ours, we encounter a lot of NullPointerException (NPE) that we can’t trace how it happened. It’s not supposed to happen but it did.

We have assertions in place to let us know that if it didn’t pass, it’s considered as a bug. We even enable it on the release build so it will be logged during bug reports. Most of these are asserting something to be not null. However, when these assertions do fail or NPEs happen, the game still continues to run on a broken state. This is when I’ll have a mild panic attack. My usual thought is “this may have caused more bugs down the line then the player left a negative review”. There’s got to be a better way to deal with nulls.

How other languages deal with it

Everyday, I try to give 30 minutes to an hour to study/try a new technology. Last month, I happen to dabble with Rust. I wanted to know what the fuss is about. Why is it such a beloved language? I happen to fall in love with it, too. I like statically typed languages due to its constraints and strictness. It makes code more maintainable. Rust takes this even further. It adds more constraints, especially in memory access and checks them during compilation. I even wished that Unity used Rust for scripting instead. But that’s not going to happen in a million years.

I also found in Rust how they handle null. They don’t have it. Instead, you have to use Option<T> to support the concept of none or nothing. Option<T> is an enum that has two values – Some and None. Enums in Rust can have data/state with them. If you have a variable of type Option<T> and it has a value, you use the Some value. You use None if it doesn’t have a value. It’s as if you’re assigning null to it.

// These are two variables in Rust
let some_string: Option<String> = Some(String::from("hello string"));
let null_string: Option<String> = None; // This is like assigning null

The difference in Rust is how you access the value stored in the Option<T> variable. It doesn’t have like a Value property or get_value() method. You can only access the value using pattern matching.

// There's no such thing in Rust
let string_from_option: String = some_string.get_value();

// To get the string value, you have to use pattern matching
match some_string {
    Some(value) => {
        // Do something with the String value here
    },
    None => {
        // Do something when it has no value
    }
}

Why is this better? It’s better because you’re forced to handle the non existence of the value when dealing with an Option<T> variable. There’s no other way around it. It’s a very clear intent: “The variable may not have a value. You should handle it.” This helps reduce NPEs. So I thought, “how can I port this to C#?”

Implementation

Turns out that there are already C# libraries for this like this one. I didn’t like it though because it uses anonymous delegates which means it throws garbage whenever it is invoked. That’s not good for games. So I tried to come up with my own.

public struct Option<T> where T : class {
    public static readonly Option<T> NONE = new Option<T>();

    public static Option<T> Some(T value) {
        return new Option<T>(value);
    }
    
    private readonly bool hasValue;
    private readonly T value;

    private Option(T value) {
        this.value = value;
        Assertion.AssertNotNull(this.value); // Can't be null
        
        this.hasValue = true;
    }

    public bool IsSome {
        get {
            return this.hasValue;
        }
    }

    public bool IsNone {
        get {
            return !this.hasValue;
        }
    }

    public void Match(IOptionMatcher<T> matcher) {
        if (this.hasValue) {
            matcher.OnSome(this.value);
        } else {
            matcher.OnNone();
        }
    }

    public TReturnType Match<TReturnType>(IFuncOptionMatcher<T, TReturnType> matcher) {
        return this.hasValue ? matcher.OnSome(this.value) : matcher.OnNone();
    }
}

public interface IOptionMatcher<in T> where T : class {
    void OnSome(T value);
    void OnNone();
}

public interface IFuncOptionMatcher<in TWrappedType, out TReturnType> where TWrappedType : class {
    TReturnType OnSome(TWrappedType value);
    TReturnType OnNone();
}

These are the main components of my Option framework. Option basically is a struct wrapper to a class pointer/reference. The NONE static value can be used as the none or nothing state. The static method Some() is used to create an Option that has a value.

Then it has a method Match() that accepts an interface. The method OnSome() is the only way that the value inside Option can be accessed. Note that there’s no Value property or GetValue() in Option struct.

Why interface? It’s because I could then create a common implementation like this:

public class DelegateOptionMatcher<T> : IOptionMatcher<T> where T : class {
    public delegate void SomeProcessor(T value);

    public delegate void NoneProcessor();

    private readonly SomeProcessor someProcessor;
    private readonly NoneProcessor noneProcessor;

    public DelegateOptionMatcher(SomeProcessor someProcessor, NoneProcessor noneProcessor) {
        this.someProcessor = someProcessor;
        this.noneProcessor = noneProcessor;
    }

    public DelegateOptionMatcher(SomeProcessor someProcessor) : this(someProcessor, null) {
    }

    public void OnSome(T value) {
        this.someProcessor(value);
    }

    public void OnNone() {
        this.noneProcessor?.Invoke();
    }
}

This is how it used:

// Say we have this as a member variable
private IOptionMatcher<Character> dealDamageMatcher;

public void ApplyDamage(Option<Character> target) {
    if(this.dealDamageMatcher == null) {
        this.dealDamageMatcher = new DelegateOptionMatcher(delegate(Character target) {
            // This is only executed if there is indeed a target
            target.DeductHp(this.damageAmount);
        }, delegate() {
            // Code here when target is none
        });
    }

    target.Match(this.dealDamageMatcher);
}

This technique allows me to write the code that I want executed if the value is not null without having to create a new class type. This also avoids garbage as the delegates are stored in an instance that is then reused in subsequent calls.

A common implementation for IFuncOptionMatcher<T, R> is also written in the same way but with a return type.

Issues

One glaring issue with this is verbosity. Generally, I don’t have a problem with it as long as the intent is clear. But it can be frustrating to have to create an IOptionMatcher<T> instance just to handle a value from an Option. Basically, I’m paying safety with verbosity.

Another issue is dealing with outside variables. A solution like DelegateOptionMatcher is not enough when you have external variables that you need to use inside the handler. Consider the following:

private Option<Weapon> currentWeapon;

private IOptionMatcher<Weapon> setMultiplierMatcher;

public void SetDamageMultiplier(float amount) {
    if(this.setMultiplierMatcher == null) {
        this.setMultiplierMatcher = new DelegateOptionMatcher(delegate(Weapon weapon) {
            weapon.DamageMultiplier = amount;
        });
    }

    this.currentWeapon.Match(this.setMultiplierMatcher);
}

Can you find the bug? The variable amount here is external to the matcher. When the DelegateOptionMatcher is first instantiated, it records the value of amount only at that moment. When the method SetDamageMultiplier() is invoked again, the value of the passed parameter will not be the one that will be used in the matcher. It will use the value when it was instantiated.

For example, on first call say SetDamageMultiplier(1.1f). The value 1.1 will be stored in the instantiated matcher. On next call say SetDamageMultiplier (2.0f), the amount used in the matcher would still be 1.1. Bummer, right?

To fix this, I have to write an inner class like this (which I was trying to avoid):

private class SetDamageMultiplierMatcher : IOptionMatcher<Weapon> {
    private int amount;

    public void Init(int amount) {
        this.amount = amount;
    }

    public void OnSome(Weapon weapon) {
        weapon.DamageMultiplier = this.amount;
    }

    public void OnNone() {
        // Nothing to do here
    }
}

private readonly SetDamageMultiplierMatcher setMultiplierMatcher = new SetDamageMultiplierMatcher();

public void SetDamageMultiplier(float amount) {
    this.setMultiplierMatcher.Init(amount);
    this.currentWeapon.Match(this.setMultiplierMatcher);
}

This is just ugly.

Conclusion

I’m using this concept on some of our critical parts, the ones where we get the most NPEs. It’s not the best solution but it’s the best I got for now. If you know something better or how to make it better, do tell. I’m all ears.

How I wish C# won’t allow direct access to the value if a variable can be null. The programmer should be forced to handle it via a code structure like pattern matching. Any other way works for me.

I also know that in C# 8, there’s an option that you can set where references are non nullable by default. But still, direct access to nullable variables are still allowed via the Value property. Either way, C# 8 for Unity will only be available by 2020. By then, more NPEs will have propagated.

How we did our localization in Academia

Despite its simplicity, our localization process in our game, Academia, has enabled the game to be translated into 10 languages. All have been made by modders. Some languages even have multiple mods. (Big thanks to our players who loved the game so much, they made a language mod for it.)

Obstacles

Some time around second quarter 2018, some of our players have been clamoring about localization. As a game in early access, we had to listen. However, there are things to think about before we can proceed. First, our game is not yet done. It means that we’ll be updating translations frequently for each update. Second, we didn’t have the mental bandwidth to manage multiple translators.

Our solution is to support language modding. It’s a compromise between somewhat supporting localization and allowing non-english speakers to play the game. There are cons of course. What if no one will do the translation? Players may also abandon their mods. They could also create incomplete translations. Let’s just say these are cons that we can live with. The bottom line is we need to have some sort of localization.

Execution

The initial plan was to look for a simple open text format that will act as a database of terms. We’re also looking for that format which has a FOSS software that we can tell players to download if they wish to make translations. Then we’d use such format in the game.

For modders, I imagine they would use the software to load our English master file, enter the translations, then export the file that contains the translations. They can then upload the exported file on Steam Workshop.

Turns out that there’s no such thing. The tools I’ve seen are too complicated with lots of unnecessary features. Most are web apps which requires me to set up a server. I don’t have time for that! I just want a really simple software that can be downloaded where one can manage a database of terms and translations. I need it to be dead simple because non developers are going to use it. This is really surprising for me. (Tell me if you know of such software.)

The only way to move forward is… to make our own format and software. Here’s the awkward part… we made the app in Unity. That’s right! As embarrassing that is, it’s the way we know how and we can develop it fast.

This is our term management app that we distribute to modders

Our terms format is simple. It’s just XML with the following self explanatory elements:

Make it accessible

With the term management app done, all we have to do next is provide a simple step by step documentation on how to develop a language mod. On the game side, we provided a way for modders to test their translations even when it’s still incomplete. There are cases where some characters are not displayed properly. Modders can then report it to us for fixing.

We also provided a separate section in the game’s mods panel UI specifically for language mods. This can easily be done by using tags (we added it in our documentation to use such tag).

Support

With the term management app and documentation in place, all we had to do was announce that we have such things and wait. It really didn’t take long for players to start tinkering. In the single month of June 2018, we had 4 modders working on different languages right away.

Needless to say, we provided support. We responded right away when a language modder encounters an issue. We even had to release hotfixes just to support the new language. Characters not displaying properly comes up a lot. We also made updates to the term management app based on modder’s suggestions.

And that’s it

We made a bold decision to make our own term management format and wrote the software in a not so appropriate technology. It still worked for us in the end.

Replicating Polymorphism in ECS

Polymorphism is one of those OOP foundation that’s hard to shake off. It’s easy to do and it’s very intuitive. We use it to refactor code to make it more organized. We use it to manage different behaviors while maintaining only a single interface. We also use it to make full blown authoring editors that affects runtime behavior by complementing it with reflection.

However, it can’t be done in an environment where reference types are not allowed (Unity’s HPC#). It can be replicated in another way in ECS and that’s what is this post about.

OOP Version

Let’s say we have a framework of projectiles. Let’s also say that our game’s world is steam punk with magic. We want to be able to support both bullets and magic projectiles like fireball. Our OOP code might look like this:

public abstract class Projectile {
    // Common projectile properties
    protected Vector2 position;
    private int damage;

    // Each projectile type may implement its own movement
    public abstract void Move();

    // Each projectile might have different effects on impact
    public abstract void OnImpact();
    
    public int Damage {
        get {
            return this.damage;
        }
    }
}

public class Bullet : Projectile {
    private readonly Vector2 direction;
    private readonly float speed;

    public Bullet(Vector2 direction, float speed) {
        this.direction = direction.normalized;
        this.speed = speed;
    }

    public override void Move() {
        // Move by speed in straight line
        this.position += this.speed * Time.deltaTime * this.direction;
    }

    public override void OnImpact() {
        // Maybe just destroy the bullet here
    }
}

public class Fireball : Projectile {
    private readonly float initialVelocity;
    private readonly float angle;
    private readonly float gravity;

    private readonly float vX;
    private readonly float vYPart;
    
    private float polledTime;

    public Fireball(float initialVelocity, float angle, float gravity) {
        this.initialVelocity = initialVelocity;
        this.angle = angle;
        this.gravity = gravity;
        
        // Cache
        this.vX = this.initialVelocity * Mathf.Cos(this.angle);
        this.vYPart = this.initialVelocity * Mathf.Sin(this.angle);
    }

    public override void Move() {
        // Move by projectile motion
        // There are better ways to do this but just bare with me
        this.polledTime += Time.deltaTime;
        
        // Update X
        this.position.x += this.vX * Time.deltaTime;

        // Update Y
        float vY = this.vYPart - this.gravity * this.polledTime;
        this.position.y += vY * Time.deltaTime;
    }

    public override void OnImpact() {
        // Destroy the projectile then send a request to show a fireball impact particle effect
        // at the current position
    }
}

Then we may implement the class that handles projectiles like this:

public class ProjectileManager {
    private readonly List<Projectile> projectiles = new List<Projectile>();

    public void Add(Projectile projectile) {
        this.projectiles.Add(projectile);
    }

    public void Update() {
        for (int i = 0; i < this.projectiles.Count; ++i) {
            this.projectiles[i].Move();
        }
        
        CheckForCollisions();
    }

    private void CheckForCollisions() {
        // Let's just say a list of collisions exists
        foreach(Collision c in this.collisions) {
            // Apply damage if health component exists
            if(c.Health != null) {
                c.Health.Value -= c.Projectile.Damage;
            }

            // Execute custom on impact routines
            c.Projectile.OnImpact();
        }
    }
}

This contrived projectile system should be easy to follow. There are bullets that move in straight line at a certain direction and fireballs that move in projectile motion. Both of them can be handled by ProjectileManager as they inherit the Projectile base class.

ECS Version

In Unity’s pure ECS, classes can’t be used and inheritance is certainly not available either. But I consider that a good thing as a different mindset is needed in ECS. We must forget OOP when modelling game elements using ECS.

Let’s start with our projectile component:

public struct Projectile : IComponentData {
    public float2 position;
    public readonly int damage;

    public Projectile(float2 position, int damage) {
        this.position = position;
        this.damage = damage;
    }
}

Our intent with this component is that any entity that has this is considered as a projectile. This component can be used by systems to filter only entities with such component and then execute general or common logic that can be applied to all projectiles. Think of it as code found in the base class.

Next are the components that represents the subclasses:

public struct Bullet : IComponentData {
    public readonly float2 direction;
    public readonly float speed;

    public Bullet(float2 direction, float speed) {
        this.direction = math.normalize(direction);
        this.speed = speed;
    }
}

public struct Fireball : IComponentData {
    public readonly float initialVelocity;
    public readonly float angle;
    public readonly float gravity;

    public readonly float vX;
    public readonly float vYPart;
    
    public float polledTime;
    
    public Fireball(float initialVelocity, float angle, float gravity) {
        this.initialVelocity = initialVelocity;
        this.angle = angle;
        this.gravity = gravity;
        
        // Cache
        this.vX = this.initialVelocity * math.cos(this.angle);
        this.vYPart = this.initialVelocity * math.sin(this.angle);

        this.polledTime = 0;
    }
}

We just moved their data to their own components. To model a bullet projectile, we create an entity with Projectile and Bullet components. The same is true for a fireball projectile.

// Create a bullet
Entity bullet = entityManager.CreateEntity(typeof(Projectile), typeof(Bullet));

// Create a fireball
Entity fireball = entityManager.CreateEntity(typeof(Projectile), typeof(Fireball));

From here, we can then define the different movement systems:

// Using ComponentSystem here instead of JobComponentSystem so that it's
// easier to understand
public class BulletMoveSystem : ComponentSystem {
    private EntityQuery query;

    protected override void OnCreate() {
        this.query = GetEntityQuery(typeof(Projectile), typeof(Bullet));
    }

    protected override void OnUpdate() {
        this.Entities.With(this.query).ForEach(delegate(ref Projectile projectile, ref Bullet bullet) {
            projectile.position = projectile.position + (bullet.speed * Time.deltaTime * bullet.direction);
        });
    }
}

public class FireballMoveSystem : ComponentSystem {
    private EntityQuery query;
    
    protected override void OnCreate() {
        this.query = GetEntityQuery(typeof(Projectile), typeof(Fireball));
    }
        
    protected override void OnUpdate() {
        this.Entities.With(this.query).ForEach(delegate(ref Projectile projectile, ref Fireball fireball) {
            // Move by projectile motion
            fireball.polledTime += Time.deltaTime;

            float2 newPosition = projectile.position;
            newPosition.x += fireball.vX * Time.deltaTime;
            
            float vY = fireball.vYPart - fireball.gravity * fireball.polledTime;
            newPosition.y += vY * Time.deltaTime;

            projectile.position = newPosition;
        });
    }
}

To handle the different “on impact” logic, a separate system could handle checking the collision detection then add a Collided tag component to entities that have collided. Separate systems would then handle damage dealing and “on impact” routines. Here’s the system that handles collision detection:

// Component that is added to entities that have collided
public struct Collided : IComponentData {
    public readonly Entity other; // The other entity that we collided with
}

public class ProjectileCollisionDetectionSystem : ComponentSystem {
    private EntityQuery query;

    protected override void OnCreate() {
        this.query = GetEntityQuery(typeof(Projectile), ComponentType.Exclude<Collided>());
    }
    
    protected override void OnUpdate() {
        // Adds Collided component to entities that have collided
    }
}

The system that applies damage could look like this:

// Component representing health
public struct Health : IComponentData {
    public int amount;
}

public class ProjectileDamageSystem : ComponentSystem {
    private EntityQuery query;

    protected override void OnCreate() {
        this.query = GetEntityQuery(typeof(Projectile), typeof(Collided));
    }

    protected override void OnUpdate() {
        ComponentDataFromEntity<Health> allHealth = GetComponentDataFromEntity<Health>();
        
        this.Entities.With(this.query).ForEach(delegate(ref Projectile projectile, ref Collided collided) {
            // Apply damage
            Health health = allHealth[collided.other];
            health.amount -= projectile.damage;
            allHealth[collided.other] = health; // Modify
        });
    }
}

The “on impact” routines can also be implemented in their own systems:

public class BulletOnImpactSystem : ComponentSystem {
    private EntityQuery query;

    protected override void OnCreate() {
        this.query = GetEntityQuery(typeof(Projectile), typeof(Collided), typeof(Bullet));
    }

    protected override void OnUpdate() {
        // Just destroy them
        this.EntityManager.DestroyEntity(this.query);
    }
}

public class FireballOnImpactSystem : ComponentSystem {
    private EntityQuery query;

    protected override void OnCreate() {
        this.query = GetEntityQuery(typeof(Projectile), typeof(Collided), typeof(Fireball));
    }

    protected override void OnUpdate() {
        this.Entities.With(this.query).ForEach(delegate(ref Projectile projectile) {
            // Request fireball particle effect at projectile.position
        });
        
        // Then destroy them
        this.EntityManager.DestroyEntity(this.query);
    }
}

At this point, we have replicated the logic from OOP to its ECS version.

The more ideal ECS solution

From our initial refactoring, we can make changes to some components to make them more reusable.

One area that we can improve is movement. Instead of using Bullet component for straight direction movement, why not define it to its own component like StraightDirectionMovement. This way, we can reuse it to other elements in the game that requires this kind of movement. Before we can do this, we also need to remove the position property from Projectile and use a separate component representing it. This is what the new movement system will look like:

// Holds the projectile's position
public struct Position : IComponentData {
    public float2 value;
}

public struct StraightDirectionMovement : IComponentData {
    public readonly float2 direction;
    public readonly float speed;

    public StraightDirectionMovement(float2 direction, float speed) {
        this.direction = math.normalize(direction);
        this.speed = speed;
    }
}

public class StraightDirectionMovementSystem : ComponentSystem {
    private EntityQuery query;

    protected override void OnCreate() {
        this.query = GetEntityQuery(typeof(Position), typeof(StraightDirectionMovement));
    }

    protected override void OnUpdate() {
        this.Entities.With(this.query).ForEach(delegate(ref Position position, ref StraightDirectionMovement movement) {
            position.value = position.value + (movement.speed * Time.deltaTime * movement.direction);
        });
    }
}

In the same way, Fireball’s projectile motion movement could also turned into its own component. Say we call it ProjectileMotionMovement:

public struct ProjectileMotionMovement : IComponentData {
    public readonly float initialVelocity;
    public readonly float angle;
    public readonly float gravity;

    public readonly float vX;
    public readonly float vYPart;

    public float polledTime;

    public ProjectileMotionMovement(float initialVelocity, float angle, float gravity) {
        this.initialVelocity = initialVelocity;
        this.angle = angle;
        this.gravity = gravity;
    
        // Cache
        this.vX = this.initialVelocity * math.cos(this.angle);
        this.vYPart = this.initialVelocity * math.sin(this.angle);

        this.polledTime = 0;
    }
}

public class ProjectileMotionMovementSystem : ComponentSystem {
    private EntityQuery query;
    
    protected override void OnCreate() {
        this.query = GetEntityQuery(typeof(Position), typeof(ProjectileMotionMovement));
    }
        
    protected override void OnUpdate() {
        this.Entities.With(this.query).ForEach(delegate(ref Position position, ref ProjectileMotionMovement movement) {
            // Move by projectile motion
            movement.polledTime += Time.deltaTime;

            float2 newPosition = position.value;
            newPosition.x += movement.vX * Time.deltaTime;
            
            float vY = movement.vYPart - movement.gravity * movement.polledTime;
            newPosition.y += vY * Time.deltaTime;

            position.value = newPosition;
        });
    }
}

Another dimension that we can improve is the routines on impact. Instead of having BulletOnImpactSystem which only works on entities with Bullet component, why not make it more reusable. Let’s use a component named DestroyOnCollision instead:

// A tag component that identifies an entity to be removed on collision
public struct DestroyOnCollision : IComponentData {
}

public class DestroyOnCollisionSystem : ComponentSystem {
    private EntityQuery query;

    protected override void OnCreate() {
        this.query = GetEntityQuery(typeof(Collided), typeof(DestroyOnCollision));
    }

    protected override void OnUpdate() {
        // Just destroy them
        this.EntityManager.DestroyEntity(this.query);
    }
}

Requesting a particle effect like what happens for fireball’s impact could also be its own component and system. Let’s say we have a component named RequestParticleEffectOnCollision:

public struct RequestParticleEffectOnCollision : IComponentData {
    // Used to identify what particle effect to deploy
    public readonly int effectId;

    public RequestParticleEffectOnCollision(int effectId) {
        this.effectId = effectId;
    }
}

[UpdateBefore(typeof(DestroyOnCollisionSystem))]
public class RequestParticleEffectOnCollisionSystem : ComponentSystem {
    private EntityQuery query;

    protected override void OnCreate() {
        this.query = GetEntityQuery(typeof(Position), typeof(Collided), typeof(RequestParticleEffectOnCollision));
    }

    protected override void OnUpdate() {
        this.Entities.With(this.query).ForEach(delegate(ref Position position, ref RequestParticleEffectOnCollision effectRequest) {
            // Request the particle effect at position.value
        });
        
        // Destruction of the entity will now be handled by DestroyOnCollisionSystem
    }
}

With the systems above in place, modelling a bullet object in the game will now look like this:

Entity bullet = entityManager.CreateEntity(typeof(Translation), 
typeof(Projectile), 
typeof(StraightDirectionMovement), 
typeof(DestroyOnCollision));

Notice how we have completely removed the concept of “Bullet”. A bullet is now composed of components that make up its behavior. It’s also the same for Fireball:

Entity fireball = entityManager.CreateEntity(typeof(Translation), 
typeof(Projectile), 
typeof(ProjectileMotionMovement), 
typeof(RequestParticleEffectOnCollision), 
typeof(DestroyOnCollision));

Last thoughts

It’s obvious to see that turning OOP to ECS requires more code. All I can say to that is… it is what it is. Unfortunately, we’re just using C# constructs to kind of model ECS. There’s no such thing as an ECS aware programming language (yet) that will dramatically reduce all this code. I consider it a trade off. I get the benefit of highly modular and efficient code but at the expense of verbosity.

Honestly, verbosity is not a steep price to pay. I get to have code that could be as fast as it can be without switching to another more complex code like C++, which is verbose by itself.

Signals Implemented in ECS

We already have our own Signals framework as described in this post. It’s basically a messaging system where the parties, dispatcher and handlers, don’t have references to each other. Think of it as an elaborate Observer Pattern.

This current system, however, is implemented in OOP. As we are slowly inching towards Unity’s Pure ECS where reference types can’t be used, I needed a Signals framework that can be used in this environment. In other words, I needed a Signals system that is implemented in ECS.

In this post, I’ll show how I did it, albeit a bit more simplified. I’ll only show the non jobified version so it’s easier to understand.

Dispatching Signals

We can model a signal as an entity with a Signal component and another component that will act as the parameters of the signal. This other component will also act as a filter to the system that will handle such signal. We can define it in code like this:

// Signal is just a tag component
public struct Signal : IComponentData {
    public static void Dispatch<T>(EntityManager entityManager, T signalComponent) where T : struct, IComponentData {
        Entity entity = entityManager.CreateEntity();
        entityManager.AddComponentData(entity, new Signal());
        entityManager.AddComponentData(entity, signalComponent);
    }

    // An EntityCommandBuffer could also be used
    public static void Dispatch<T>(EntityCommandBuffer buffer, T signalComponent) where T : struct, IComponentData {
        Entity entity = buffer.CreateEntity();
        buffer.AddComponent(entity, new Signal());
        buffer.AddComponent(entity, signalComponent);
    }
}

Handling Signals

Signal handlers or responders are implemented as a system. Here’s an example:

// Say this is the signal parameter
struct TakeDamage : IComponentData {
    public Entity target;
    public int damageAmount;
}

// This is the signal handler system
class TakeDamageSystem : ComponentSystem {
    private EntityQuery query;

    protected override void OnCreate() {
        this.query = GetEntityQuery(typeof(Signal), typeof(TakeDamage));
    }

    protected override void OnUpdate() {
        this.Entities.With(this.query).ForEach(delegate(ref TakeDamage parameter)) {
            // Deal the damage to the target here
        }
    }
}

Making the Framework

From this modelling, we can define some common code that we could reuse every time we deal with signals. One of the utility classes I’ve made is the following which I use inside systems that needs to handle signals:

public class SignalHandler<T> where T : struct, IComponentData {
    private readonly EntityQuery query;
    private ArchetypeChunkEntityType entityType;
    private ArchetypeChunkComponentType<T> componentType;

    private readonly ComponentSystemBase system;

    public delegate void Listener(Entity entity, T component);
    
    private readonly List<Listener> listeners = new List<Listener>(1);  

    public SignalHandler(ComponentSystemBase system, EntityQuery query) {
        this.system = system;
        this.query = query;
    }

    public void AddListener(Listener listener) {
        this.listeners.Add(listener);
    }

    public void Update() {
        this.entityType = this.system.GetArchetypeChunkEntityType();
        this.componentType = this.system.GetArchetypeChunkComponentType<T>();

        NativeArray<ArchetypeChunk> chunks = this.query.CreateArchetypeChunkArray(Allocator.TempJob);
        for (int i = 0; i < chunks.Length; ++i) {
            Process(chunks[i]);
        }
        
        chunks.Dispose();
    }

    private void Process(ArchetypeChunk chunk) {
        NativeArray<Entity> entities = chunk.GetNativeArray(this.entityType);
        NativeArray<T> components = chunk.GetNativeArray(this.componentType);

        int count = chunk.Count;
        for (int i = 0; i < count; ++i) {
            Publish(entities[i], components[i]);
        }
    }

    private void Publish(Entity entity, T component) {
        for (int i = 0; i < this.listeners.Count; ++i) {
            this.listeners[i].Invoke(entity, component);
        }
    }
}

It uses delegates as signal handler. They are maintained in a list so it can hold multiple handlers. Here is how it can be used:

class TakeDamageSystem : ComponentSystem {
    private EntityQuery signalQuery;
    private SignalHandler<TakeDamage> signalHandler;

    protected override void OnCreate() {
        this.signalQuery = GetEntityQuery(typeof(Signal), typeof(TakeDamage));
        this.signalHandler = new new SignalHandler<T>(this, this.signalQuery);
        this.signalHandler.AddListener(HandleSignal);
    }

    private void HandleSignal(Entity entity, TakeDamage parameter) {
        // Deal the damage to the target here
    }

    protected override void OnUpdate() {
        this.signalHandler.Update();
    }
}

Template Signal Handler System

After writing so many signal handler systems, I realized I could use a Template Pattern to simplify creation of these systems. Ideally, only one signal should be handled by a signal handler system, anyway. So I made this template system class.

public abstract class SignalHandlerComponentSystem<T> : ComponentSystem where T : struct, IComponentData {
    private EntityQuery signalQuery;
    private SignalHandler<T> signalHandler;

    protected override void OnCreate() {
        this.signalQuery = GetEntityQuery(typeof(Signal), typeof(T));
        this.signalHandler = new SignalHandler<T>(this, this.signalQuery);
        this.signalHandler.AddListener(OnDispatch);
    }

    protected abstract void OnDispatch(Entity entity, T signalComponent);

    protected override void OnUpdate() {
        this.signalHandler.Update();
    }
}

TakeDamageSystem can now be simplified to:

private class TakeDamageSystem : SignalHandlerComponentSystem<TakeDamage> {
    protected override void OnDispatch(Entity entity, TakeDamage signalComponent) {
        // Deal the damage to the target here
    }
}

Signal Entity Destruction

The ideal implementation is that signal handler systems should no longer need to destroy the signal entities. Destroying signal entities is not a good idea anyway because there may be multiple handlers for a certain signal.

What we can do is create another system whose only purpose is to destroy signals. This can be easily made as:

public class DestroySignalsSystem : ComponentSystem {
    private EntityQuery query;

    protected override void OnCreate() {
        this.query = GetEntityQuery(typeof(Signal));
    }

    protected override void OnUpdate() {
        this.EntityManager.DestroyEntity(this.query);
    }
}

Then our template class should have this attribute so it will run before DestroySignalsSystem:

[UpdateBefore(typeof(DestroySignalsSystem))]
public abstract class SignalHandlerComponentSystem<T> : ComponentSystem where T : struct, IComponentData {
    ...
}

Handler Runs Before Dispatcher

There’s a case that we need to handle. Let’s say our project got bigger with lots of systems. At some point, there may be a case that a system dispatches a signal but its handler system is executed before it. What happens is the signal would be destroyed without having it handled by the handler system. That’s a potential bug that could be hard to trace.

// Say this is the execution order
SystemA
SystemB
DestroySignalsSystem

// If SystemB dispatches a signal that is handled by SystemA, 
// it may no longer be handled since it will be destroyed 
// before the next frame begins.

What we need to do is we should not destroy signals on the same frame. We should let the signal live for another frame. We can do this by adding another tag component that lets DestroySignalsSystem know that such signal has already passed another frame. We only destroy signals that have this tag component. We need a new component and a system that adds this component:

// A tag component that identifies a signal entity that it has passed a single frame
public struct SignalFramePassed : IComponentData {
}

// Adds SignalFramePassed component to signal entities 
// that doesn't have this component yet
[UpdateAfter(typeof(DestroySignalsSystem))]
public class SignalFramePassedSystem : ComponentSystem {
    private EntityQuery query;

    protected override void OnCreate() {
        this.query = GetEntityQuery(typeof(Signal), ComponentType.Exclude<SignalFramePassed>());
    }

    protected override void OnUpdate() {
        this.PostUpdateCommands.AddComponent(this.query, typeof(SignalFramePassed));
    }
}

We then edit the entities query of DestroySignalsSystem to only include entities that has the SignalFramePassed component

public class DestroySignalsSystem : ComponentSystem {
    private EntityQuery query;

    protected override void OnCreate() {
        this.query = GetEntityQuery(typeof(Signal), typeof(SignalFramePassed));
    }

    protected override void OnUpdate() {
        this.EntityManager.DestroyEntity(this.query);
    }
}

As we are doing this, we’re introducing another problem. Since signals live in 2 frames, it may happen that they will be handled by signal handlers more than once. That’s another cause for bugs.

This can be solved in the template class. We can add another tag component that identifies a signal that it’s already handled by the signal handler system.

[UpdateBefore(typeof(DestroySignalsSystem))]
public abstract class SignalHandlerComponentSystem<T> : ComponentSystem where T : struct, IComponentData {
    private EntityQuery signalQuery;
    private SignalHandler<T> signalHandler;

    // Tag that identifies a signal entity that it has been already processed
    public struct ProcessedBySystem : IComponentData {
    }

    protected override void OnCreate() {
        this.signalQuery = GetEntityQuery(typeof(Signal), typeof(T), ComponentType.Exclude<ProcessedBySystem>());
        this.signalHandler = new SignalHandler<T>(this, this.signalQuery);
        this.signalHandler.AddListener(OnDispatch);
    }

    protected abstract void OnDispatch(Entity entity, T signalComponent);

    protected override void OnUpdate() {
        this.signalHandler.Update();
        this.PostUpdateCommands.AddComponent(this.signalQuery, typeof(ProcessedBySystem));
    }
}

Now, we only process signals that doesn’t have the ProcessedBySystem component yet. After executing the handler, we add the ProcessedBySystem so it will no longer be processed by the signal handler system. This solves the problem of having a handler system process a signal more than once.

That’s it! I hope that was simple enough and useful.

Some Unity ECS Utilities

Not much has happened since last month in terms of programming breakthrough. I guess we’re just busy adding features to our game. As a fallback article, I’ll share some utility code that we use, instead. These are mainly used in ECS.

BufferElement<T>

In Unity’s ECS, collections under components are not allowed (yet). The solution for now is to use DynamicBuffer. It’s like a component that you can add to your entities but it’s a list. However, it only accepts value types that implements the IBufferElementData interface. There are times when I just want a list of entities or vectors, but Entity and float3 does not implement IBufferElementData. It’s kind of a pain to create wrapper structs for each type. So I created BufferElement<T>:

[InternalBufferCapacity(10)]
public readonly struct BufferElement<T> : IBufferElementData where T : struct {
    public readonly T value;

    public BufferElement(T value) {
        this.value = value;
    }
}

// Usage
// Say we need a list of Entity targets
Entity entity = this.EntityManager.CreateEntity(typeof(BufferElement<Entity>));
DynamicBuffer<BufferElement<Entity>> targets = this.EntityManager.GetBuffer<BufferElement<Entity>>(entity);

Entity[] targets = GetTargets();
for(int i = 0; i < targets.Length; ++i) {
    targets.Add(new BufferElement<Entity>(targets[i]));
}

CollectedCommandsSystem

Unity’s ECS was released when our game Academia is already too big with heavy OOP usage. It’s unreasonable to rewrite it to pure ECS. So we’re using a lot of the hybrid interface.

In MonoBehaviour world, you can get access to an EntityManager and create entities with it. One problem we encountered is that too much usage of EntityManager is slow. It has something to do with changing ECS chunk data for each entity/component alteration calls like AddComponentData() or SetComponentData(). The recommendation is to use an EntityCommandBuffer to temporarily pool the commands then execute them in one go so that chunk changes are minimized.

I wanted something that I could use in any of my MonoBehaviours then automatically flush the commands whenever ECS systems kick in. I made CollectedCommandsSystem:

public class CollectedCommandsSystem : ComponentSystem {
    private EntityCommandBuffer? pendingBuffer;

    protected override void OnUpdate() {
        if (this.pendingBuffer != null) {
            this.pendingBuffer.Value.Playback(this.EntityManager);
            this.pendingBuffer.Value.Dispose();
            this.pendingBuffer = null;
        }
    }
    
    public EntityCommandBuffer Buffer {
        get {
            if (this.pendingBuffer == null) {
                this.pendingBuffer = new EntityCommandBuffer(Allocator.TempJob);
            }

            return this.pendingBuffer.Value;
        }
    }
}

// Usage
// Say in a MonoBehaviour
private CollectedCommandsSystem collectedCommands;

private void Awake() {
    this.collectedCommands = World.Active.GetOrCreateSystem<CollectedCommandsSystem>();
}

private void Update() {
    // Then somewhere along the way, use the buffer to alter ECS data
    EntityCommandBuffer buffer = this.collectedCommands.Buffer;
    Entity entity = buffer.CreateEntity();
    buffer.AddComponentData(entity, new Whatever());
}

SharedComponentQuery<T>

When I was creating our custom rendering system, one of the things that I constantly have to deal with are shared components. While working with them, I found that I could make something such that I don’t have to repeat the same code every time I need to handle them. This is what I came up:

public class SharedComponentQuery<T> where T : struct, ISharedComponentData {
    private readonly ComponentSystemBase system;
    private readonly EntityManager entityManager;

    [ReadOnly]
    private ArchetypeChunkSharedComponentType<T> sharedComponentType;

    private readonly List<T> sharedComponents = new List<T>();
    private readonly List<int> indices = new List<int>();

    public SharedComponentQuery(ComponentSystemBase system, EntityManager entityManager) {
        this.system = system;
        this.entityManager = entityManager;
    }

    public void Update() {
        this.sharedComponentType = this.system.GetArchetypeChunkSharedComponentType<T>();
        
        this.sharedComponents.Clear();
        this.indices.Clear();
        this.entityManager.GetAllUniqueSharedComponentData(this.sharedComponents, this.indices);
    }

    public T GetSharedComponent(ref ArchetypeChunk chunk) {
        int sharedComponentIndex = chunk.GetSharedComponentIndex(this.sharedComponentType);
        int uniqueIndex = this.indices.IndexOf(sharedComponentIndex);
        Assertion.Assert(uniqueIndex >= 0);
        return this.sharedComponents[uniqueIndex];
    }
    
    public IReadOnlyList<T> SharedComponents {
        get {
            return this.sharedComponents;
        }
    }

    public IReadOnlyList<int> Indices {
        get {
            return this.indices;
        }
    }
}

It’s a utility class where all of my common code regarding shared components are placed. Here’s a sample usage:

// Say you have a SharedComponent named RenderConfig
class RenderConfigSystem : ComponentSystem {
    private SharedComponentQuery<RenderConfig> query;

    protected override void OnCreate() {
        // Prepare the query
        query = new SharedComponentQuery<RenderConfig>(this, this.EntityManager);
    }

    protected override void OnUpdate() {
        this.query.Update();

        // Traverse through all RenderConfig
        IReadOnlyList<RenderConfig> configs = this.query.SharedComponents;

        // Start from 1 because it element at 0 is an auto generated instance
        for(int i = 1; i < configs.Count; ++i) {
            RenderConfig config = configs[i];
            // Do something with config here
        }
    }

    // Another usage is if you want to get the shared component of a chunk (there's only one per chunk)
    private void Process(ArchetypeChunk chunk) {
        RenderConfig config = this.query.GetSharedComponent(ref chunk);
    }
}

That’s all for now. 🙂

Replacing our OOP pathfinding to pure ECS

I’ve been writing about implementing A* in ECS to see if it’s faster. See here and here. It is indeed faster, way faster! I’m too ashamed to tell you that our old OOP A* is so slow. It runs on the range of 2ms to 50ms. This is already using Jump Point Search and multithreaded. It badly needs optimization. I want to use the ECS implementation, however, it’s not that easy to port an OOP code to pure ECS as it uses the HPC# subset (High Performance C#).

Think of HPC# as a viral change requirement. Since it can’t handle managed types, classes can’t be used. The obvious thing to do is to turn a class into a struct, but that’s not enough. Each member variables that is a managed type must be converted to something that HPC# allows. That means that each class referenced inside the converted class must be turned into a struct (or something else). Then you do this recursively to the member variables of those classes and so on.

Most of the time, it’s impossible. If you use any of the classes from .NET or from third party code, you’re stuck. You can’t edit those or it’s not advisable to do so. Just think how many classes or MonoBehaviours do you have that use List or Dictionary. Those code will be tough to convert or never at all.

Ephipany

While I was sharing this post on Reddit, somebody mentioned about copying the Tile information into a separate struct and use that in my ECS A*. I thought that this would be costly since every frame, I have to copy all of the grid’s Tile information to their struct copies. I kept thinking about this, then I realized I don’t need to make copies every frame. I just need to keep and maintain the struct copies.

This means that I should have a separate copy of grid of tiles that are implemented as structs (HPC#). They will only be updated every time the original Tile is changed. I also don’t need to copy every Tile information. I only need to copy what’s needed for pathfinding. The copying also need not to be too synchronized to save CPU. I can defer the copying to multiple frames. Tile changes are not very frequent in the game anyway.

I may have found an effective strategy to somehow port OOP code to pure ECS without a major overhaul. With this in mind, I set out to execute this major change.

Refactoring

This is a partial part of our Tile class:

public sealed class Tile {
    private readonly TileGrid grid; // The owner grid
    private readonly GridCell cell; // Contains position, size, etc

    private NodeHandle<Tile> handle;

    private readonly Dictionary<Layer, TileItem> itemMap = new Dictionary<string, TileItem>();
    private readonly Dictionary<Layer, TileItem> taskItemMap = new Dictionary<string, TileItem>();

    private readonly IntVector2 position;

    private int itemFlags;

    private Zone zone;

    ... 
}

This class is close to 400 lines of code. It handles so many gameplay rules regarding the tiles in the game. It wouldn’t be a good idea to turn this into HPC#. There’s a lot to convert here. At first glance, the usage of Dictionary alone would make the conversion difficult. There’s no HPC# equivalent of it yet. TileItem is a huge data class the contains information about what’s placed in a tile at a certain layer. It contains many member variables that are classes on its own. It’s the same with Zone. If I was thinking of refactoring the Tile for HPC# this way, I wouldn’t do it as it would cascade to a lot more refactoring to other systems. It’s too risky.

The Tile copy that is implemented as a struct looks like this:

public struct TileAsStruct : IComponentData {
    public int2 position;
    
    public int zoneGenderIndex;

    public byte flags;

    public bool Contains(byte flag) {
        return (this.flags & flag) > 0;
    }
    
    // The flags
    public const byte HAS_PHYSICAL_BLOCKER = 1;
    public const byte HAS_STUDENT_BLOCKER = 1 << 1;
    public const byte STAFF_ONLY = 1 << 2;
}

Simple, isn’t it? This is all I need for the purposes of pathfinding. It’s implemented as a component so I could easily make queries of it in system classes (the S in ECS). For the actual A* implementation, I used the same implementation as shown in this post. I implemented the custom HeuristicCostCalculator and Reachability that is required for the game’s rules.

As for the system that handles the copying of information from the original tile, I used a separate MonoBehaviour which listens for signals that causes Tile changes. On dispatch of such signals, I just enqueue the positions of the tiles that needs to be recopied. Then per Update(), I copy some X tiles to their struct copies that are taken from the queue.

private const int COPIES_PER_FRAME = 50;

private void Update() {
    int copiedCount = 0;
    while (this.dirtyTilesQueue.Count > 0 && copiedCount < COPIES_PER_FRAME) {
        CreateUpdateEntity();
        ++copiedCount;
    }
}

private void CreateUpdateEntity() {
    int2 position = this.dirtyTilesQueue.Dequeue();
    Tile tile = this.grid.GetTile(position.x, position.y);

    byte flags = 0;
    if (tile.HasPhysicalBlocker) {
        flags |= TileAsStruct.HAS_PHYSICAL_BLOCKER;
    }

    if (tile.HasStudentBlocker) {
        flags |= TileAsStruct.HAS_STUDENT_BLOCKER;
    }

    // Check if for staff only
    if (tile.Zone?.StaffOnly ?? false) {
        flags |= TileAsStruct.STAFF_ONLY;
    }

    TileAsStruct tileValue = new TileAsStruct() {
        position = position,
        zoneGenderIndex = tile.Zone?.GenderIndex ?? Gender.ANY.Index, // Use ANY if tile has no zone
        flags = flags
    };

    // Create the entity that will mark to update the TileAsStruct in the grid
    Entity updateEntity = this.entityManager.CreateEntity(typeof(UpdateTileAsStruct));
    this.entityManager.SetComponentData(updateEntity, new UpdateTileAsStruct(position, tileValue));
}
Just added this image so that the article is not a big wall of text.

Huge Improvements

Our old implementation is so slow that I only allowed up to 4 A* requests per frame. Now with the Burst compiled ECS implementation, I was able to increase it to 40 requests. That’s a 10x improvement! There’s even room for more (it takes around 300 requests to reach 16ms). But pushing the number up is not necessary. The simulation already looks good. Besides, I need to reserve CPU cycles for future features.

That’s it for now. 🙂

From 160 to 300 A* requests per frame

In my last post, I talked about achieving 160 requests per frame using burst compiled code. After more studies and iterations, I found that I could do better.

As I’m watching the profiler of my old implementation, I realized that each A* search is being executed one after the other. They were distributed in multiple threads but they’re not running in parallel. Unused thread execution time is wasteful. I need to write code that can execute A* searches in parallel. It wasn’t as easy as I initially thought. I had to run through some Unity ECS gotchas.

A* searches are distributed in all threads but are not running in parallel. Click here for larger image.

Obstacles

My first instinct is to use IJobParallelFor. I can schedule an IJobParallelFor for each chunk of requests. This means that I execute an A* search for each index:

struct AStarSearchParallelFor : IJobParallelFor {
    [ReadOnly]
    public NativeArray<Entity> entities; // The request entities

    public void Execute(int index) {
        // Perform a single A* search here
    }
}

This also implies that I have to instantiate local containers like a list for nodes, an open set which uses a heap, and a hash map that acts like a hash set for the close set. I couldn’t get this to work with the Burst compiler. I always get errors when disposing a container within the job struct. Somehow, this is not supported by Burst.

While rummaging through the ECS forums, I found a thread that answers my problem. It said that you don’t need to dispose containers that are allocated in a job that is allocated with Allocater.Temp. I don’t have a concrete evidence that this is true except that I’ve tried it in a test system and see if my memory usage grows (due to undisposed containers). My memory usage is not growing at all. So I’m going to have to trust that programmer who was replying in that thread.

struct AStarSearchParallelFor : IJobParallelFor {
    [ReadOnly]
    public NativeArray<Entity> entities; // The request entities

    public void Execute(int index) {
        Search search = new Search() {
            // Set parameters here
        };
        search.Execute();
    }

    // The whole A* algorithm is implemented here
    // I did it this way so I don't have to pass around too many
    // parameters if it were made by only using functions
    private struct Search {
        private NativeList<AStarNode> allNodes;
        private OpenSet openSet;
        private NativeHashMap<int2, byte> closeSet;

        public void Execute() {
            // It's ok not to dispose these (I hope)
            this.allNodes = new NativeList<AStarNode>(10, Allocator.Temp);
            GrowingHeap heap = new GrowingHeap(new NativeList<HeapNode>(10, Allocator.Temp));
            this.openSet = new OpenSet(heap, new NativeHashMap<int2, AStarNode>(10, Allocator.Temp));
            this.closeSet = new NativeHashMap<int2, byte>(10, Allocator.Temp);

            ... // Rest of the algorithm
        }
    }
}

From here, the next step is to pass the required components from chunks. My A* job requires four components:

  • AStarSearchParameters – Stores the start and goal positions
  • AStarPath – Stores whether the path is reachable or not. Also stores the number of positions to step through to get to the goal. Basically, this component can be used to traverse through the path.
  • DynamicBuffer<Int2BufferElement> – The list of positions from A* search will be stored here
  • Waiting – This is a common component that I use that has a “done” boolean. This will be used by agents to see if their A* request is already done so they can proceed to their next action.

As I’m working with chunks, it’s just natural that I use NativeArray for the components and a BufferAccessor for the list of positions. But there’s a problem here. BufferAccessor can’t be used in this case. I made a test for this:

private struct JobParallelFor : IJobParallelFor {
    [NativeDisableContainerSafetyRestriction]
    public BufferAccessor<Int2BufferElement> accessor;

    public void Execute(int index) {
        DynamicBuffer<Int2BufferElement> buffer = this.accessor[index];
        buffer.Clear();

        for (int i = 0; i < 1000; ++i) {
            buffer.Add(new Int2BufferElement() {
                value = new int2(i, i)
            });
        }
    }
}

The job above is a test code I made to see if I can use BufferAccessor in an IJobParallelFor. This works for only one chunk. If there’s more than one chunk to process, it throws errors. Something about a NativeArray can’t be accessed as a previous job is still writing to it. Intuitively, it should work as I’m only accessing one and only one DynamicBuffer per Execute(). But, I’m wrong. Still totally an ECS noob.

I need to find another way. Then I remember my old implementation. In there I used BufferFromEntity instead. This can still be applied to my parallel solution since I can retrieve the entities from chunks. I made a simple test if it would work in parallel:

private struct JobWithBufferFromEntity : IJobParallelFor {
    [ReadOnly]
    public NativeArray<Entity> entities;

    [NativeDisableContainerSafetyRestriction]
    public BufferFromEntity<Int2BufferElement> allBuffers;

    public void Execute(int index) {
        Entity entity = this.entities[index];
        DynamicBuffer<Int2BufferElement> buffer = this.allBuffers[entity];
        
        buffer.Clear();

        for (int i = 0; i < 1000; ++i) {
            buffer.Add(new Int2BufferElement() {
                value = new int2(i, i)
            });
        }
    }
}

Surprisingly, this works even with multiple chunks. With this, I wrote my A* job that runs in parallel:

[BurstCompile]
public struct AStarSearchParallelFor<HeuristicCalculator, ReachabilityType> : IJobParallelFor
    where HeuristicCalculator : struct, HeuristicCostCalculator where ReachabilityType : struct, Reachability {
    [ReadOnly]
    public NativeArray<Entity> entities; // The request entities

    [NativeDisableContainerSafetyRestriction, ReadOnly]
    public ComponentDataFromEntity<AStarSearchParameters> allParameters;

    [NativeDisableContainerSafetyRestriction, WriteOnly]
    public ComponentDataFromEntity<AStarPath> allPaths;

    [NativeDisableContainerSafetyRestriction, WriteOnly]
    public ComponentDataFromEntity<Waiting> allWaiting;

    [NativeDisableContainerSafetyRestriction, WriteOnly]
    public BufferFromEntity<Int2BufferElement> allPathLists;

    [ReadOnly]
    public ReachabilityType reachability;

    private HeuristicCalculator heuristicCalculator;

    // This will be specified by client on whether it wants to include diagonal neighbors
    [ReadOnly]
    public NativeArray<int2> neighborOffsets;

    [ReadOnly]
    public GridWrapper gridWrapper;

    // Execute search per entity in entities
    public void Execute(int index) {
        Search search = new Search() {
            entity = this.entities[index],
            allParameters = this.allParameters,
            allPaths = this.allPaths,
            allPathLists = this.allPathLists,
            reachability = this.reachability,
            heuristicCalculator = this.heuristicCalculator,
            neighborOffsets = this.neighborOffsets,
            gridWrapper = this.gridWrapper
        };
        search.Execute();

        // Update waiting
        this.allWaiting[this.entities[index]] = new Waiting() {
            done = true
        };
    }

    private struct Search {
        public Entity entity;
        public ComponentDataFromEntity<AStarSearchParameters> allParameters;
        public ComponentDataFromEntity<AStarPath> allPaths;
        public BufferFromEntity<Int2BufferElement> allPathLists;
        public ReachabilityType reachability;
        public HeuristicCalculator heuristicCalculator;
        public NativeArray<int2> neighborOffsets;
        public GridWrapper gridWrapper;
        private NativeList<AStarNode> allNodes;
        private OpenSet openSet;
        private NativeHashMap<int2, byte> closeSet;

        private int2 goalPosition;

        public void Execute() {
            // Instantiate containers
            this.allNodes = new NativeList<AStarNode>(10, Allocator.Temp);
            GrowingHeap heap = new GrowingHeap(new NativeList<HeapNode>(10, Allocator.Temp));
            this.openSet = new OpenSet(heap, new NativeHashMap<int2, AStarNode>(10, Allocator.Temp));
            this.closeSet = new NativeHashMap<int2, byte>(10, Allocator.Temp);

            AStarSearchParameters parameters = this.allParameters[this.entity];
            this.goalPosition = parameters.goal;

            float startNodeH = this.heuristicCalculator.ComputeCost(parameters.start, this.goalPosition);
            AStarNode startNode = CreateNode(parameters.start, -1, 0, startNodeH);
            this.openSet.Push(startNode);

            float minH = float.MaxValue;
            Maybe<AStarNode> minHPosition = Maybe<AStarNode>.Nothing;

            // Process while there are nodes in the open set
            while (this.openSet.HasItems) {
                AStarNode current = this.openSet.Pop();
                if (current.position.Equals(this.goalPosition)) {
                    // Goal has been found
                    int pathCount = ConstructPath(current);
                    this.allPaths[this.entity] = new AStarPath(pathCount, true);

                    return;
                }

                ProcessNode(current);

                this.closeSet.TryAdd(current.position, 0);

                // We save the node with the least H so we could still try to locate
                // the nearest position to the destination 
                if (current.H < minH) {
                    minHPosition = new Maybe<AStarNode>(current);
                    minH = current.H;
                }
            }

            // Open set has been exhausted. Path is unreachable.
            if (minHPosition.HasValue) {
                int pathCount = ConstructPath(minHPosition.Value);
                this.allPaths[this.entity] = new AStarPath(pathCount, false); // false for unreachable
            } else {
                this.allPaths[this.entity] = new AStarPath(0, false);
            }
        }

        private AStarNode CreateNode(int2 position, int parent, float g, float h) {
            int index = this.allNodes.Length;
            AStarNode node = new AStarNode(index, position, parent, g, h);
            this.allNodes.Add(node);

            return node;
        }

        private void ProcessNode(in AStarNode current) {
            if (IsInCloseSet(current.position)) {
                // Already in closed set. We no longer process because the same node with lower F
                // might have already been processed before. Note that we don't fix the heap. We just
                // keep on pushing nodes with lower scores.
                return;
            }

            // Process neighbors
            for (int i = 0; i < this.neighborOffsets.Length; ++i) {
                int2 neighborPosition = current.position + this.neighborOffsets[i];

                if (current.position.Equals(neighborPosition)) {
                    // No need to process if they are equal
                    continue;
                }

                if (!this.gridWrapper.IsInside(neighborPosition)) {
                    // No longer inside the map
                    continue;
                }

                if (IsInCloseSet(neighborPosition)) {
                    // Already in close set
                    continue;
                }

                if (!this.reachability.IsReachable(current.position, neighborPosition)) {
                    // Not reachable based from specified reachability
                    continue;
                }

                float tentativeG = current.G + this.reachability.GetWeight(current.position, neighborPosition);

                float h = this.heuristicCalculator.ComputeCost(neighborPosition, this.goalPosition);

                if (this.openSet.TryGet(neighborPosition, out AStarNode existingNode)) {
                    // This means that the node is already in the open set
                    // We update the node if the current movement is better than the one in the open set
                    if (tentativeG < existingNode.G) {
                        // Found a better path. Replace the values.
                        // Note that creation automatically replaces the node at that position
                        AStarNode betterNode = CreateNode(neighborPosition, current.index, tentativeG, h);

                        // Only add to open set if it's a better movement
                        // If we just push without checking, a node with the same g score will be pushed
                        // which causes infinite loop as every node will be pushed
                        this.openSet.Push(betterNode);
                    }
                } else {
                    AStarNode neighborNode = CreateNode(neighborPosition, current.index, tentativeG, h);
                    this.openSet.Push(neighborNode);
                }
            }
        }

        // Returns the position count in the path
        private int ConstructPath(AStarNode destination) {
            // Note here that we no longer need to reverse the ordering of the path
            // We just add them as reversed in AStarPath
            // AStarPath then knows how to handle this
            DynamicBuffer<Int2BufferElement> pathList = this.allPathLists[this.entity];
            pathList.Clear();
            AStarNode current = this.allNodes[destination.index];
            while (current.parent >= 0) {
                pathList.Add(new Int2BufferElement(current.position));
                current = this.allNodes[current.parent];
            }

            return pathList.Length;
        }

        public bool IsInCloseSet(int2 position) {
            return this.closeSet.TryGetValue(position, out _);
        }
    }
}

Job System

Using AStarSearchParallelFor should be trivial to use in a job system, or so I thought:

private JobHandle Process(in ArchetypeChunk chunk, in SimpleReachability reachability,
    JobHandle inputDeps) {
    AStarSearchParallelFor<SimpleHeuristicCalculator, SimpleReachability> searchJob =
        new AStarSearchParallelFor<SimpleHeuristicCalculator, SimpleReachability>() {
            entities = chunk.GetNativeArray(this.entityType),
            allParameters = this.allParameters,
            allPaths = this.allPaths,
            allWaiting = this.allWaiting,
            allPathLists = this.allPathLists,
            reachability = reachability,
            neighborOffsets = this.neighborOffsets,
            gridWrapper = this.gridSystem.GridWrapper
        };

    return searchJob.Schedule(chunk.Count, 64, inputDeps);
}

I made some request entities with random start and goal position so that the system that schedules the job would run. I tried 1 request and I saw that it works. Tried 2. It worked but not in parallel. So I increased it 10. Still not in parallel. Tried 20, not in parallel. Then 50. Still not in parallel. I’m quite disappointed.

So I hanged out on the forums as a break. Luckily, I found a thread where a programmer is trying to do something like I do. He wanted to run parallel jobs for each entity because his processing is heavy. Somebody replied that he ought to use IJobParallelFor with a low innerloopBatchCount, the second parameter in Schedule() (64 in my code). At that moment, I just realized what that parameter is for. (I didn’t know. I just follow example code. Don’t judge me.)

Passing in 64 means that the execution will only try to distribute to other threads when it reaches 64 executes. It’s like a divisor to determine how many batches of execution to create. Say chunk.Count is equal to 512. This means that 8 execution batches would be created because 500 / 64 = 8. These 8 executions would then be distributed to the worker threads in parallel. At least that’s how I understood it.

In my case, there would always be only one execution batch. My request count never reached 64. I should pick a reasonable number such that there’s a higher chance that execution batches would be distributed evenly (not all A* requests are computed equally). Setting it to 1 means that there would one execution batch per request. But this is not without cost, though, I would surmise. There should be a cost associated with creating an execution batch and assigning a thread to it. So I picked 2 instead. It’s good enough that batches would be distributed evenly while not too many to pressure the scheduler.

private JobHandle Process(in ArchetypeChunk chunk, in SimpleReachability reachability,
    JobHandle inputDeps) {
    AStarSearchParallelFor<SimpleHeuristicCalculator, SimpleReachability> searchJob =
        new AStarSearchParallelFor<SimpleHeuristicCalculator, SimpleReachability>() {
            entities = chunk.GetNativeArray(this.entityType),
            allParameters = this.allParameters,
            allPaths = this.allPaths,
            allWaiting = this.allWaiting,
            allPathLists = this.allPathLists,
            reachability = reachability,
            neighborOffsets = this.neighborOffsets,
            gridWrapper = this.gridSystem.GridWrapper
        };

    // Just changed 64 to 2
    return searchJob.Schedule(chunk.Count, 2, inputDeps);
}

Once I did this, I could now see that the A* search executions are run in parallel:

Sweet parallel execution. This is 60 requests.

Benchmark

I did the same benchmark as my last article. I made a test environment where I could increase/decrease the number of requests until I hit more than 16ms. The last record was 160. This time I could make 300 requests before hitting past 16ms.

Threads are now almost filled. Click here for larger image.

Note that this is still using a generic A*. This could be further optimized by using A* algorithm optimizations like Jump Point Search or HPA*. For our use case, however, we probably won’t need to. 20 requests per frame would already be good for us.

Next Steps

I’m slowly working to apply this to our game, Academia. I’m almost there. This will be another part of the game that will be optimized using Unity’s DOTS. But that will be for another post.

How fast could Burst compiled A* be?

I’ve been excited with Unity’s Burst compiler since I saw it in action with my own code. Little by little, I’ve been porting some of our common code to use ECS so we could easily leverage Burst. I’ve always wanted to write the most basic and generic A* algorithm written in ECS and see how fast could it be when compiled with Burst. Because if it’s already fast enough, there would be no need to write more sophisticated A* variants like Jump Point Search. Around last month, I was able to do this and I was very pleased with the results.

I made a simple environment where I could run A* over. I used a 80×40 2D grid. I added some controls where I could paint cells that are blockers. I used 80×40 because it’s the one where this map of rooms was made. I could have made a procedural room generator but I just wanted to test my A* code right away. (Maybe next time.) I also didn’t want to waste time making a scrollable or zoomable map. After all, I just want to see how faster Burst compiled A* is.

Before performance testing, I had to verify first that my ECS A* code works. As you can see in the image above, it shows the resolved path from the blue tile to the red tile. I even added handling for destination that can’t be reached in which the algorithm would still try to resolve the nearest position to the target.

Red tile is unreachable

Code

Here’s the A* search code as an IJob:

[BurstCompile]
public struct AStarSearch<HeuristicCalculator, ReachabilityType> : IJob
    where HeuristicCalculator : struct, HeuristicCostCalculator where ReachabilityType : struct, Reachability {
    public Entity owner;
    public int2 startPosition;
    public int2 goalPosition;

    [ReadOnly]
    public ReachabilityType reachability;

    // This will be specified by client on whether it wants to include diagonal neighbors
    [ReadOnly]
    public NativeArray<int2> neighborOffsets;

    public GridWrapper gridWrapper;

    public ComponentDataFromEntity<AStarPath> allPaths;

    [ReadOnly]
    public BufferFromEntity<Int2BufferElement> allPathLists;

    public ComponentDataFromEntity<Waiting> allWaiting;

    // This is the master container for all AStarNodes. The key is the hash code of the position.
    // This will be specified by client code
    public NativeList<AStarNode> allNodes;

    // Only used for existence of position in closed set
    // This will be specified by client code
    public NativeHashMap<int2, byte> closeSet;

    private HeuristicCalculator heuristicCalculator;

    public OpenSet openSet;

    public void Execute() {
        this.allNodes.Clear();
        this.openSet.Clear();
        this.closeSet.Clear();

        DoSearch();

        // Mark as done waiting for the agent to respond
        this.allWaiting[this.owner] = new Waiting {
            done = true
        };
    }

    private void DoSearch() {
        if (!this.reachability.IsReachable(this.goalPosition)) {
            // Goal is not reachable
            this.allPaths[this.owner] = new AStarPath(0, false);

            return;
        }

        float startNodeH = this.heuristicCalculator.ComputeCost(this.startPosition, this.goalPosition);
        AStarNode startNode = CreateNode(this.startPosition, -1, 0, startNodeH);
        this.openSet.Push(startNode);

        float minH = float.MaxValue;
        Maybe<AStarNode> minHPosition = Maybe<AStarNode>.Nothing;

        // Process while there are nodes in the open set
        while (this.openSet.HasItems) {
            AStarNode current = this.openSet.Pop();
            if (current.position.Equals(this.goalPosition)) {
                // Goal has been found
                int pathCount = ConstructPath(current);
                this.allPaths[this.owner] = new AStarPath(pathCount, true);

                return;
            }

            ProcessNode(current);

            this.closeSet.TryAdd(current.position, 0);

            // We save the node with the least H so we could still try to locate
            // the nearest position to the destination 
            if (current.H < minH) {
                minHPosition = new Maybe<AStarNode>(current);
                minH = current.H;
            }
        }

        // Open set has been exhausted. Path is unreachable.
        if (minHPosition.HasValue) {
            int pathCount = ConstructPath(minHPosition.Value);
            this.allPaths[this.owner] = new AStarPath(pathCount, false); // false for unreachable
        } else {
            this.allPaths[this.owner] = new AStarPath(0, false);
        }
    }

    private AStarNode GetNode(int index) {
        return this.allNodes[index];
    }

    private AStarNode CreateNode(int2 position, int parent, float g, float h) {
        int index = this.allNodes.Length;
        AStarNode node = new AStarNode(index, position, parent, g, h);
        this.allNodes.Add(node);

        return node;
    }

    // Returns the position count in the path
    private int ConstructPath(AStarNode destination) {
        // Note here that we no longer need to reverse the ordering of the path
        // We just add them as reversed in AStarPath
        // AStarPath then knows how to handle this
        DynamicBuffer<Int2BufferElement> pathList = this.allPathLists[this.owner];
        pathList.Clear();
        AStarNode current = GetNode(destination.index);
        while (current.parent >= 0) {
            pathList.Add(new Int2BufferElement(current.position));
            current = GetNode(current.parent);
        }

        return pathList.Length;
    }

    private void ProcessNode(in AStarNode current) {
        if (IsInCloseSet(current.position)) {
            // Already in closed set. We no longer process because the same node with lower F
            // might have already been processed before. Note that we don't fix the heap. We just
            // keep on pushing nodes with lower scores.
            return;
        }

        // Process neighbors
        for (int i = 0; i < this.neighborOffsets.Length; ++i) {
            int2 neighborPosition = current.position + this.neighborOffsets[i];

            if (current.position.Equals(neighborPosition)) {
                // No need to process if they are equal
                continue;
            }

            if (!this.gridWrapper.IsInside(neighborPosition)) {
                // No longer inside the map
                continue;
            }

            if (IsInCloseSet(neighborPosition)) {
                // Already in close set
                continue;
            }

            if (!this.reachability.IsReachable(current.position, neighborPosition)) {
                // Not reachable based from specified reachability
                continue;
            }

            float tentativeG = current.G + this.reachability.GetWeight(current.position, neighborPosition);

            float h = this.heuristicCalculator.ComputeCost(neighborPosition, this.goalPosition);

            if (this.openSet.TryGet(neighborPosition, out AStarNode existingNode)) {
                // This means that the node is already in the open set
                // We update the node if the current movement is better than the one in the open set
                if (tentativeG < existingNode.G) {
                    // Found a better path. Replace the values.
                    // Note that creation automatically replaces the node at that position
                    AStarNode betterNode = CreateNode(neighborPosition, current.index, tentativeG, h);

                    // Only add to open set if it's a better movement
                    // If we just push without checking, a node with the same g score will be pushed
                    // which causes infinite loop as every node will be pushed
                    this.openSet.Push(betterNode);
                }
            } else {
                AStarNode neighborNode = CreateNode(neighborPosition, current.index, tentativeG, h);
                this.openSet.Push(neighborNode);
            }
        }
    }

    private bool IsInCloseSet(int2 position) {
        return this.closeSet.TryGetValue(position, out _);
    }
}

The grid is represented as a linear array of int2 (2 dimensional vector but with integers). This is handled by the passed GridWrapper upon job creation.

Neighbors or adjacent nodes is easily represented as an array of int2 offsets. This is the neighborOffsets variable in the code. To get the neighbors, the code just loops through this array and then add the currently processed position and the current neighbor offset. The resulting int2 would be the position of the neighbor.

The whole thing is just the baseline A* algorithm that everybody learns on the first time.

Performance Test

The performance test is simple. Every frame, I make an X amount of A* requests and the code executes those X requests within the same frame. A single request means picking a random start and random end position from the map in the image above. I picked this kind of map representation because this is how our current game maps approximately looks like.

The machine used for the test has the following specs:

  • Intel i3-6100 @ 3.70Ghz (4CPUs)
  • 16GB RAM

I run two tests. First is for non-Burst compiled, then a Burst compiled one. The profilers are run with a development build of the simulation.

Non Burst

The following is how the non-Burst compiled simulation ran with a single request (X = 1):

Non Burst with single request (larger image)

A single A* search runs around 1ms-8ms.

I kept increasing the number of requests until simulation hits past 16ms (60fps target). This target is already hit at 15 requests:

Past 16ms is reached with just 15 requests (larger image)

Burst

Let’s see the performance of the Burst compiled simulation with just one request:

Burst with single request

The running time goes as low as 0.04ms and as high as 0.70ms. I haven’t seen a running time that reached at least 1ms.

I increased the number of requests until the frame time reaches past 16ms. Guess how many requests it takes.

Burst compiled A* can run up to 120 requests until it reaches 16ms (larger image)

I reached 120 requests before hitting that 16ms mark. That’s 8 times more requests than non-Burst A* can handle! For me, that’s already kind of amazing. That’s more than enough for the kind of games we make (simulation games). We don’t need 120 requests per frame. Even just 20 per frame would be nice.

Final Words

While the simulation here is not really indicative of that of an actual game, it still showed that Burst compiled A* is significantly faster. I wouldn’t be able to use this on our current game because of heavy OOP usage but I’m excited for our next game.

Run managed code in Unity’s Job System

I’ve always thought that only the HPC# (High Performance C#) subset can be executed in jobs which are run in Unity’s reserved worker threads. We do some multithreading in our game but we’re running them in C#’s managed threads as they’re not using HPC#. I feel like I’m wasting the worker threads if they’re not fully utilized. While we have ECS systems running in worker threads, they’re using very little running time.

ThreadNotUtilized
Look at the bottom 3 timelines. Those are the worker threads. Notice the huge Idle time. What a waste!

As much as I want to utilize the worker threads, it’s very hard to port a huge existing OOP code to HPC# so that I can use the job system. I would have to rewrite the game from scratch which is stupid. That is until I read a thread from the Unity forums on how you can run managed C# code in jobs. I tried it on a simple test and I’m surprised how easy it is.

Simple Example

Here’s an example to run managed code in a job:

// Common interface for executing something
interface Task {
    void Execute();
}

// Just an arbitrary heavy computation as an example
class SampleHeavyTask : Task {
    private Unity.Mathematics.Random random;

    public SampleHeavyTask() {
        this.random = new Unity.Mathematics.Random(1234);
    }

    public void Execute() {
        float total = 0;
        for (int i = 0; i < 50000; ++i) {
            float randomValue = this.random.NextFloat(1, 100);
            total += math.sqrt(randomValue);
        }
    }
}

public class ManagedCodeInJob : MonoBehaviour {
    private readonly List<GCHandle> gcHandles = new List<GCHandle>();
    private readonly List<JobHandle> jobHandles = new List<JobHandle>();

    private void Update() {
        // Schedule one task per frame
        ScheduleTask();
    }

    private void ScheduleTask() {
        Task task = new SampleHeavyTask();
        GCHandle gcHandle = GCHandle.Alloc(task);
        this.gcHandles.Add(gcHandle); // We remember this so we can free it later

        Job job = new Job() {
            handle = gcHandle
        };

        // We remember the JobHandle so we can complete it later
        this.jobHandles.Add(job.Schedule());
    }

    private void LateUpdate() {
        // Free and complete the scheduled jobs
        for (int i = 0; i < this.jobHandles.Count; ++i) {
            this.jobHandles[i].Complete();
            this.gcHandles[i].Free();
        }

        this.jobHandles.Clear();
        this.gcHandles.Clear();
    }

    private struct Job : IJob {
        public GCHandle handle;

        public void Execute() {
            Task task = (Task) handle.Target;
            task.Execute();
        }
    }
}

The key here is the Job struct code found at the bottom. You provide it with a GCHandle that targets an instance that provides the execution. You can just cast it then call the method that executes your task.

In Update(), we schedule one job which tells Unity "Hey, please run this in a worker thread if you can" (sometimes Unity can't). Then in LateUpdate(), we force that job to complete which means that main thread will wait for the task execution that's in the worker thread to be completed. We also call GCHandle.Free().

I admit that I didn't know what GCHandle does until the writing of this post. I'll explain as best as I can. GCHandle.Alloc() protects a managed object from being garbage collected. I would surmise that the worker threads are unmanaged/native environments which is outside the jurisdiction of the garbage collector. We have to make sure that the garbage collector does not collect the objects while the worker threads are still executing them. GCHandle.Alloc() kind of removes an object from the garbage collector's radar. After the thread execution completes, we have to call GCHandle.Free() to mark the object as garbage collectable again.

If you attach the script above to a GameObject and run the scene, you will see something like this in the profiler:

SingleManagedCode

You can see that the job that executes the sample heavy task is run in one of the worker threads.

My PC has 4 cores, so Unity automatically creates 3 worker threads (coreCount – 1). The remaining one will be the main thread. Let’s look at what happens when we call ScheduleTask() in Update() three times:

private void Update() {
    // Of course you can use a for loop here.
    ScheduleTask();
    ScheduleTask();
    ScheduleTask();
}

This is what the profiler looks like in my machine:

ThreeManagedCode
Three jobs are executed in parallel in worker threads.

We can see here that the three worker threads are each executing the heavy job in parallel. In the main thread, you can see that LateUpdate() only waits for 5.57ms instead of waiting the 16.45ms total execution time of the three worker threads. That’s multithreading at work.

Wait for execution to finish

The current example forces the threads to finish before the frame ends. This is done by calling JobHandle.Complete() in LateUpdate(). However, there may be times where you want to wait for the thread execution to finish instead of forcing it to complete. This implies that the thread may execute the job in multiple frames. This is useful for tasks that may run more than 16ms per execution.

To do this, we can simply change LateUpdate() to the following:

private void LateUpdate() {
    // Wait for jobs to complete before freeing and removing them
    // Iterate from the end so we can easily remove from lists
    for (int i = this.jobHandles.Count - 1; i >= 0; --i) {
        if (this.jobHandles[i].IsCompleted) {
            this.jobHandles[i].Complete();
            this.jobHandles.RemoveAt(i);
            
            this.gcHandles[i].Free();
            this.gcHandles.RemoveAt(i);
        }
    }
}

Be careful!

As easy as this is, this is not without drawbacks. The normal job system using HPC# would usually tell you what you’re doing wrong in terms of doing multithreading code such as race conditions.

In using managed code, you’re on your own. Unity won’t tell you anything so you have to make sure that your multithreaded code is correct. Multithreading is inherently hard so be careful. Use this only if you know what you’re doing.

That’s all for now. Hope it helps!

A small optimization for A*

Right now, my side project is to make an A* code that works for Unity’s ECS (this is my main project). I’ve always wondered how many A* calls can be completed in a single frame using a Burst compiled implementation that runs in a job (multithreaded). This post is not about the results as I’m not there yet. My current implementation is still horrible. I did, however, observe a little optimization that I can make while rewriting our in house A* solution.

No need to fix the Open Set

Again, I’m not going to explain the whole A* algorithm here as there are better resources for that else where. I’ll just assume that you’re familiar with it.

A* employs the concept of an open set which is just a collection of positions or nodes that are possible path but haven’t been considered yet. Every node in the open set has an F score associated with it which represents how far the node is from the goal position. Every time we consider a node in the open set, we get the one with the least F score. That’s because a lesser F score means it’s closer to the goal.

Most A* implementations uses a heap data structure for the open set. A heap is a kind of container that remembers what item in it has the least or most value (depending on the comparison function that you pass to it). For A*, we need the node with the least F score. Using a heap is just a perfect fit for the purposes of the algorithm. The heap may be an array or a tree based implementation.

In A*, there’s a special case that says when a neighbor node is already in the open set, we check if it has a lesser F score than the one contained in the set. If it is, we update the F score of the one in the open set, instead of pushing the neighbor node into the set. When a node’s F score is updated, the internal state of the heap is now disrupted. The way the heap maintains the ordering of its items could now be wrong. We have to fix the heap such that it’s internal integrity is still intact. If it’s an array based implementation, we have to move the position of the node so that it is in its right place. It’s the same with the tree based implementation. The node may need to bubble up since it now has a lesser value.

Then I had a thought. What if I can avoid fixing the heap? What if I’ll just push the neighbor with better F score into the heap? The heap will fix itself anyway. The duplicate node with the least F score will always be popped first. When the old node with higher F score is popped, I can just check if it’s already in the close set which would be the case since the one with lesser F score would have been already popped and processed. The old node will always be skipped.

Not fixing the heap would also lead to a better implementation. We can now use immutable structs for the heap entries and A* nodes as we no longer need to mutate them.

So I did all this and the algorithm still checks out. I admit that this is a micro-optimization and the gains are very small, but hey, fixing the heap is still code that you don’t have to execute.

private void ProcessNode(in AStarNode current) {
    if (IsInCloseSet(current.position)) {
        // Already in closed set. We no longer process because the same node with lower F
        // might have already been processed before. Note that we don't fix the heap. We just
        // keep on pushing nodes with lower scores.
        return;
    }

    // Process neighbors
    for (int i = 0; i < this.neighborOffsets.Length; ++i) {
        int2 neighborPosition = current.position + this.neighborOffsets[i];
        
        if (current.position.Equals(neighborPosition)) {
            // No need to process if they are equal
            continue;
        }

        if (!this.gridWrapper.IsInside(neighborPosition)) {
            // No longer inside the map
            continue;
        }
    
        if (IsInCloseSet(neighborPosition)) {
            // Already in close set
            continue;
        }

        if (!this.reachability.IsReachable(current.position, neighborPosition)) {
            // Not reachable based from specified reachability
            continue;
        }
        
        float tentativeG = current.G + this.reachability.GetWeight(current.position, neighborPosition);
        float h = this.heuristicCalculator.ComputeCost(neighborPosition, this.goalPosition.Value);

        AStarNode neighborNode = GetNode(neighborPosition);
        if (neighborNode.hasValue) {
            // This means that the node is already in the open set
            // We update the node if the current movement is better than the one in the open set
            if (tentativeG < neighborNode.G) {
                // Found a better path.
                neighborNode = CreateNode(neighborPosition, tentativeG, h, new Maybe<int2>(current.position));
				
				// Just push the node with lesser F. No need to fix the open set.
                this.openSet.Push(new HeapNode(neighborPosition, neighborNode.F));
            }
        } else {
		    // Not yet in open set
            neighborNode = CreateNode(neighborPosition, tentativeG, h, new Maybe<int2>(current.position));
            this.openSet.Push(new HeapNode(neighborPosition, neighborNode.F));
        }
    }
}