Refactoring techniques for legacy code

Spread the love

After successfully generating some barely readable C# code from a GW-BASIC program, I mentioned how it would make a good basis for a legacy code refactoring exercise. Today I will share the results of this exercise with a brief survey of useful refactoring techniques for turning such dense procedural code into arguably better OO code, one small step at a time.

Magic number to enumeration

This refactoring is similar to Replace Magic Number with Symbolic Constant, except you end up with an actual enum type at the end rather than just named constants. The steps are straightforward.

Create an inner static class inside the class where the magic numbers are used. Use public const int fields to give names to the numbers. (example commit)
Change the static class to an enum, with enumeration members replacing the int fields. Change all associated int usages to the enum type. Lean heavily on the compiler here to tell you where your types are mismatching, and fix each error one by one. (example commit)
Promote the inner type to a top level type. (example commit)

If/else to function table

Tomas Jansson describes something similar in his blog post. The idea is that you take a dense set of conditional blocks of the form if (x == A) { DoA(); } else if (x == B) { DoB(); } else . . . and replace it with a table of functions keyed by the “x” value. Most commonly you would use a dictionary to implement the function table, but you can go one step farther and encapsulate that detail as a first class collection (see next item). Note that each conditional block must have a compatible structure for this to be applicable — that is, you should be able to use a common function signature with the same inputs and outputs for each case. Here is the process I followed to do this slowly and safely:

Introduce a dictionary and pass it through to where it will be used eventually; at first it will just be empty. The dictionary key will be the type used in each conditional expression and the value will be a delegate type matching the signature of each inner block. Add code to do the lookup of the handler function where the dictionary is passed in. If the lookup succeeds, call the returned function. If it fails (key not found), continue to the if/else block. (example commit)
Pick a single case and convert it to use the function table. To do this, initialize the table entry near where the dictionary is created. Then, remove the condition from the if/else block that you just replaced. (example commit)
Continue the previous step until the last case is replaced. (example commit)

Raw collection to first class collection

I have written before about first class collections in the context of Object Calisthenics. The idea here is basically the same — we want to take the direct usage of arrays, lists, and dictionaries and instead use a class that mediates access using high level methods instead. To do this incrementally, I follow these steps:

Create a new inner class with a public field or property to hold the raw collection and a default constructor to initialize the value. Change all existing references to use the collection via the field/property. (example commit)
Move each method that deals with the collection to the class. Typically you will rename the method as you do this; e.g. in place of the UpdateThings method you will extract an Update method to the class Things (example commit)
Repeat the above step until there are no more direct usages of the raw collection. At this point, you can move the class to the top level and eliminate the public accessor. (example commit)

Replace parallel arrays with single strongly-typed array

In heavily procedural code, parallel arrays are the rule rather than the exception. This kind of code rarely has a place in an object-oriented approach, where we prefer putting related data items into a single record type. Making such a transformation is harder to boil down into a simple recipe, but it roughly works out to the following:

Extract the parallel arrays together into a single class, much like how you would start when building a first class collection. (example commit)
Create a class with scalar fields corresponding to each parallel array element. At this point, you will still use the parallel arrays as the primary storage location but will generally initialize the items using the fields of the new class. (example commit)
Fully replace the parallel arrays with one array using the new class. (example commit)

Conclusion

Going through this exercise was a great opportunity to practice disciplined refactoring. Keep in mind that, in true legacy code fashion, I only had one golden master test to tell me if things were on the right track. This (mostly self-imposed) limitation forced me to make different choices than I might have done if I had a really great test suite. Even so, the quality of the end result was, if not “good,” at least “not terrible.” By many measures — cohesion, single responsibility, etc. — the code was much improved.

If I can do all this with machine generated C# code originating from a GW-BASIC program written in the last century, imagine what you can do with code written by humans in a modern language only years earlier!

WriteAsync .NET

Testing, coding, in that order.