My Thresholds for Refactoring

I’ve read a book called Code Complete about a decade ago. The mantra of that book is “the goal of software is to manage complexity.” That phrase is repeated throughout the book and it has stuck with me ever since. The idea is that since we make software to manage complexity, we should strive to manage the complexity of our code as well. The book promotes maintainability as the primary metric to work on when making software. If your code is maintainable, other software metrics like robustness, correctness, and efficiency will follow. I don’t want to elaborate how and why this is. The book writes about this in detail. I have experienced this to be true based on my years of software development.

This mindset of maintainability first has also served me well when I became a game programmer. Video games are somewhat in a precarious position because you’re not managing a real world complexity. Rather, the complexity is artificial or made entirely from imagination. It is in the form of game design which includes the graphical representation and game mechanics. It’s still a complexity, nonetheless, and often times more complicated, too. Over the years of programming, I have developed this brain personality that starts nagging me when I reach thresholds in my code. I don’t want to write about how to make your code maintainable per se. Instead, I wanted to share some common thresholds in code that I keep in mind that results to refactoring. By doing this frequently and sometimes unconsciously, it leads to maintainable code. In programming lingo, these are also referred to as “code smells”. There are lots of those but I’ll only enumerate the ones that I commonly encounter, thus also commonly refactor.

Cyclomatic Complexity

Cyclomatic complexity is simply the total number of control flow branches within a function, method or routine plus one. Control flow branches are caused by if, while, for, foreach, do-while, catch clauses (including finally), and each case in a switch statement. Each AND (&&) and OR (||) operators are also included in the count. I consider anything that breaks the linear flow of statements as a branch. Thinking about cyclomatic complexity like this makes it transferrable to any programming language that may have special operators or constructs. My threshold is 10. This means that when I see a function that has cyclomatic complexity of more than 10, that function needs to be refactored. This can simply be fixed by introducing a new function and move some of the control flow branching from the original function to the new one.

For example, the following method has a cyclomatic complexity of 4 (remember plus one) and is safe from being ripped apart:

public static Transform FindTransformByName(Transform transformRoot, string name) {
    if (name.Equals(transformRoot.name)) {
        return transformRoot;
    }

    // search in children
    Transform searchResult = null;
    foreach (Transform childTransform in transformRoot) {
        searchResult = FindTransformByName(childTransform, name);
        if (searchResult != null) {
           return searchResult;
        }
    }

    return null;
}

Of course there are exceptions. If there’s a switch statement that has 10 or more case branches and there’s no other logical way to write such function, I’ll probably just leave it be. But this rarely happens. There are many ways to restructure code to prevent such pattern.

Parameter Count

This is self explanatory. It’s the number of paramaters that a function/method/routine has. My threshold for this is 7. Seven parameters to me is already a stretch and I itch to refactor such function already. More than 7 is really a no no.

There are many ways to fix this. Introducing a new function and moving some parameters there might do the trick. A set of parameters that logically goes together might be bundled into a struct or another class. Functions with this amount of parameters are probably big. Turning the whole function into a separate class where the parameters become member variables could be viable. This way you can further refactor the big function to several smaller functions all contained in the new class.

Nesting Depth

Nesting is when you indent a block of code due to a control flow like if, for, or while. For example the following code has nesting of 3:

for(...) {
    if(...) {
        if(...) {
        }
    }
}

Deep nesting is bad because it makes code hard to read thus making it less maintainable. My threshold is 3. Three is already an eyesore to me. Any nest I see beyond that is refactored away to another function.

Inheritance Depth

I’m glad that modern programming is moving away from inheritance. Game engines like Unity uses components as building blocks instead of inheritance. In hindsight, this makes inheritance no longer the defacto way of making a game unlike in the old days. I’ll no longer elaborate why inheritance makes software less maintainable. You can search about that. Nonetheless, inheritance is still a tool that can be used properly. I do use it albeit sparingly.

Inheritance depth is the amount of ancestors in an inheritance chain. In the following example, GreatGrandChild has an inheritance depth of 3:

class Guy { ... }
class Child : Guy { ... }
class GrandChild : Child { ... }
class GreatGrandChild : GrandChild { ... }

My threshold for inheritance depth is 3. When such a thing happen, I become very careful with deriving a class in that hierarchy. An inheritance depth of more than 3 means that I may have to restructure my code. Sometimes there’s no other better way. I do let it go if adding another child class is the quickest and most logical way to do something. Just inside me, the alarm bells are already so loud when that happens. The key to managing inheritance is to not use it frequently and design it with a flat structure as much as possible, like maintain an inheritance depth of 1.

Callback Count

I want to be clear that I’m not talking about callback chain count here. That’s different. If you have JavaScript in web background, you know what I’m talking about. Callback chaining is barely used in game programming, at least for me because I avoid it. Callback count is merely the amount of callbacks that you use in a class. If you have an exposed member variable that is an Action<> or Func<> that can be assigned from outside of the class, that’s probably a callback.

What usually happens is I introduce one callback. Then later on I introduce another one. Then much later, I add a new one. Then I see this and mentally decide that it’s ugly.

My threshold for callback count is 2. Three is just too much for me. At that point, I’ll refactor that into an interface and manage instances of the interface instead. Maintaining separate callbacks becomes unwieldy and clunky when there are more of them. For example, say you have this class that has a callback count of 4:

class RouteAssignmentManager {
    private List<Action<RouteAssignment>> assignmentAddedActions = …;
    private List<Action<RouteAssignment>> assignmentRemovedActions = …;
    private List<Action<RouteAssignment>> assignmentIncreasedActions = …;
    private List<Action<RouteAssignment>> assignmentDecreasedActions = …;

    public void AddAssignmentAddedAction(Action<RouteAssignment> action) {
        ...
    }

    public void AddAssignmentRemovedAction(Action<RouteAssignment> action) {
        ...
    }

    public void AddAssignmentIncreasedAction(Action<RouteAssignment> action) {
        ...
    }

    public void AddAssignmentDecreasedAction(Action<RouteAssignment> action) {
        ...
    }
}

Whenever you want to add any one of these callbacks, you may want to add the others as well because they are logically related. For example, when you forget to implement the removed action pair for an add action, you might introduce bugs.

In this case, it’s better to turn it into an interface and manage only a single list of instances, instead:

interface RouteAssignmentManagerObserver {
    void OnAssignmentAdded(RouteAssignment assignment);
    void OnAssignmentRemoved(RouteAssignment assignment);
    void OnAssignmentIncreased(RouteAssignment assignment);
    void OnAssignmentDecreased(RouteAssignment assignment);
}

class RouteAssignmentManager {
    private List<RouteAssignmentManagerObserver> observers = …;

    public void AddObserver(RouteAssignmentManagerObserver observer) {
        ...
    }
}

This is better because every time you want to listen to changes to RouteAssignmentManager, you are ensured that you don’t forget to implement the other related methods. If you want to add a kind of change that you want to broadcast, you can simply add a method on the interface. This will break the implementing classes and will force you to look at those classes to see if they need to implement the new change notification. You reduce the amount of variables that you need to manage, too. Now, it’s just one list.

There are exceptions of course. If the callbacks are totally unrelated, sometimes it’s not logical to introduce an interface to combine them. But still, I would try to limit callbacks to 2.

Duplicate Code

When I’m writing a function, sometimes I need a few lines of code that I wrote a while ago. I’ll then copy and paste that to the function I’m writing. I just introduced duplicate code. This first copy is usually ok. Another copy, however, is no longer good. This means that I now have the same code in three different places. This is my threshold for duplicate code. Three strikes, and to a new function you go. For code with more than 5 lines of code, 2 copies should send them to a function.

Lines of Code

I’m referring to lines of code of a single class here. I usually write just one class for each file. My threshold is 500 lines of code. I don’t refactor right away if I see a class with this many lines of code. Rather, it signals to me that it’s probably no longer appropriate to add more logic or some kind of maintenance code to the class. Small functions like getter and setter are still ok to add. Small functions that are totally related to the class variables are also fine. 500 lines of code tells me that new features should probably be coded in a new class.

Conclusion

I’m not saying that these thresholds are the most optimal. These are just my preference and has become a habit. You can decide on your own thresholds. For example, if you can handle a cyclomatic complexity of 20, good for you because I can’t.

Lastly, if you want a bigger collection of code smells and how to refactor them, there’s also a book for that. I’m not advertising the books and there’s no way I’m related to the authors. I’m saying that they’re good books and they are helpful.

How about you? What are your thresholds?