Change Locality

Suppose that a computer program is correct--it fulfills its intended function--and documented. One then judges the quality of the program based on how elegant and "clean" its design is. Software engineering has a whole grab-bag of concepts by which design elegance can be judged: code conciseness, degree of coupling between modules, class cohesion, the "don't repeat yourself" principle, adherence to an established design pattern like MVC, and more.

The overwhelmingly important criterion is how easily the design can be extended with new functionality. If a program is concise, it is frequently easier to modify because it can be read and understood more rapidly. If coupling between modules is low, one can change one module without having to worry as much about the effects of that change on other modules. If there is a cohesive class that clearly encompasses the functionality you wish to add, then it is easier to decide where to put your code. The DRY (don't repeat yourself) philosophy makes changes easier since the code must only be altered in one place. And the use of design patterns helps with all of these.

Very frequently, the most difficult part of making a change to a program is when the change requires updates to many different parts of the program. Take a simple example: if you use the literal number 20 for maximum string lengths throughout your program, and then later on wish to change the lengths to 25, you have to alter your program in every place that you used 20. But if you had defined a constant STR_MAX, you would only have to alter your program in one place. Less trivially, if you want to add a button to your text editor to make text italic, you might have to edit the GUI to have the button, the representation of text to allow for the possibility that text can be italic, the actual logic that sets text to be italic, and perhaps other less intuitive parts of the program. As the number of different places where you must add code to implement a single change increases, the program becomes harder to understand and modify.

I think that when you add a new capability to a program, the locality of your change to the code is a very good indicator of the program's design quality.

Metrics

A good thing about change locality is that it can be formalized somewhat simply. Let P be the original program and let x be a change to make to P. x might be "switch from ASCII to unicode" or "add a menu option" or "switch from Swing to GTK+." x should be recognizable to a human as a distinct chunk of functionality that it's desirable and reasonable for P to be extended with. Let P(x) be the program P which has been updated with the added functionality x.

One simple metric is simply to take the difference between the last and the first locations where P(x) differs from P. For example, if P = "abcd" and P(x) = "awbcvd" then this metric would give a change locality of 3. Call this metric L1(x, P).

The above metric is very simple, and it penalizes changes both for being scattered through the program and for being verbose, but it ignores a lot of information. Two slight alterations at the beginning and end of a program would have the same L1 metric as a change that rewrites the entire program. Also, suppose you make a small change to two different modules--L1(x, P) could be wildly different depending on the order in which those modules appear in the code, even though that ordering might have no syntactic meaning for the program.

For perhaps a better metric of change locality, let i, d, s be sets of character insertions, deletions, and substitutions required to transform P into P(x), where |i| + |d| + |s| is minimized (as in the Levenshtein distance). For each change c in the union of i, d, and s, let h(c) be the distance between the index in P(x) of c and the index of the change that most closely follows c. If c is the last change in P(x), let h(c) = 1. Define the metric L2(x, P) to be the sum over all c of log(h(c)) + 1.

L2 does not suffer from the problems of L1. If two changes are very far apart in a program, L2 is close to what it would be if the changes were only somewhat far apart. L2 is minimized when all changes are confined to a contiguous block of code, and also decreases with the total number of changes.

Here's another perspective on these metrics. Suppose that the modification is represented by a single cursor positioning operation, followed by a sequence of insertion/deletion/substitution operations performed at the location of a "cursor." Insertion operations advance the cursor. So, to get from the beginning of the program to the end of the program, you must delete the whole program and insert it again. The length of this sequence of operations roughly corresponds to L1. Alternatively, suppose that the modification to the program is to be represented by a similar sequence, except that we may also use operations to jump the cursor forward some number of characters, with the number written as a sequence of digits. Then L2 is roughly proportional to the length of this modifying sequence (because the length required to write a jump is logarithmic in the distance jumped). In general, given a language A for modifying one program into another program, one could define a change locality metric LA which is the length of the shortest string in A that produces the required change in the code. ----01-13-2009----

Design Locality

The change locality of a design should take into account a number of possible changes--a good design should be easily extensible in all the ways that one is likely to want to extend it. Suppose that instead of a single change, we have a set of changes, x1, x2, ..., xn that we might like to make to P. Define the "design locality" of P given these changes to be the sum of the change localities associated with applying each change.

Design locality depends strongly on which changes you wish to make to P. If we want to judge the design of P based on its design locality, we must choose x1, x2, ..., xn to be likely incremental changes that would actually be desired in P.

Don't Repeat Yourself

Change locality can be looked at as a generalization of DRY. If your code violates DRY by containing many repetitions of the same data in different places, and you ever have to change that data, then you also have poor change locality. However, while DRY only states that the same piece of information should not be scattered throughout the program, change locality goes further by saying that if different pieces of information relate to the same concept, they should also not be scattered throughout the program (because if they are scattered and you want to change how your program handles that concept, you have to update the information in many different places).

Additionally, change locality is easier to measure than DRY. One might try to measure DRY by searching the file for repetitions of data. However, it might not always be obvious to an automated tool that one piece of data repeats another; the format could be different. And many things that look like repetitions to such a tool, like referring to a long variable name several times in your program, really aren't. Change locality can be determined automatically, provided you have a record of past changes to the program.

Design Tactic

When you make a change that requires updating your program in many places, you might add an abstraction that would have allowed you to make that change in only a single place had the abstraction existed already. In the previous example of adding a button to your text editor to make text italic, you might create an add-widget method that takes as parameters the button itself and all of the logic controlling the button. This saves you nothing for the italic button itself, because you still have to add code for add-widget in all the same places you would have had to add code for the italic button. However, the next time you create an update similar to the italics button, you can use the add-widget method again and this time reap improved locality.
----12-11-2008----

© Bart Parkis
Last modified:Tue May 19 19:07:51 2009