Codebase Refactoring (with help from Go)

1. Abstract

Go should add the ability to create alternate equivalent names for types, in order to enable gradual code repair during codebase refactoring. This article explains the need for that ability and the implications of not having it for today’s large Go codebases. This article also examines some potential solutions, including the alias feature proposed during the development of (but not included in) Go 1.8. However, this article is not a proposal of any specific solution. Instead, it is intended as the start of a discussion by the Go community about what solution should be included in Go 1.9.

This article is an extended version of a talk given at GothamGo in New York on November 18, 2016.

2. Introduction

Go’s goal is to make it easy to build software that scales. There are two kinds of scale that we care about. One kind of scale is the size of the systems that you can build with Go, meaning how easy it is to use large numbers of computers, process large amounts of data, and so on. That’s an important focus for Go but not for this article. Instead, this article focuses on another kind of scale, the size of Go programs, meaning how easy it is to work in large codebases with large numbers of engineers making large numbers of changes independently.

One such codebase is Google’s single repository that nearly all engineers work in on a daily basis. As of January 2015, that repository was seeing 40,000 commits per day across 9 million source files and 2 billion lines of code. Of course, there is more in the repository than just Go code.

Another large codebase is the set of all the open source Go code that people have made available on GitHub and other code hosting sites. You might think of this as go get’s codebase. In contrast to Google’s codebase, go get’s codebase is completely decentralized, so it’s more difficult to get exact numbers. In November 2016, there were 140,000 packages known to godoc.org and over 160,000 GitHub repos written in Go.

Supporting software development at this scale was in our minds from the very beginning of Go. We paid a lot of attention to implementing imports efficiently. We made sure that it was difficult to import code but forget to use it, to avoid code bloat. We made sure that there weren’t unnecessary dependencies between packages, both to simplify programs and to make it easier to test and refactor them. For more detail about these considerations, see Rob Pike’s 2012 article “Go at Google: Language Design in the Service of Software Engineering.”

Over the past few years we’ve come to realize that there’s more that can and should be done to make it easier to refactor whole codebases, especially at the broad package structure level, to help Go scale to ever-larger programs.

3. Codebase refactoring

Most programs start with one package. As you add code, occasionally you recognize a coherent section of code that could stand on its own, so you move that section into its own package. Codebase refactoring is the process of rethinking and revising decisions about both the grouping of code into packages and the relationships between those packages. There are a few reasons you might want to change the way a codebase is organized into packages.

The first reason is to split a package into more manageable pieces for users. For example, most users of package regexp don’t need access to the regular expression parser, although advanced uses may, so the parser is exported in a separate regexp/syntax package.

The second reason is to improve naming. For example, early versions of Go had an io.ByteBuffer, but we decided bytes.Buffer was a better name and package bytes a better place for the code.

The third reason is to lighten dependencies. For example, we moved os.EOF to io.EOF so that code not using the operating system can avoid importing the fairly heavyweight package os.

The fourth reason is to change the dependency graph so that one package can import another. For example, as part of the preparation for Go 1, we looked at the explicit dependencies between packages and how they constrained the APIs. Then we changed the dependency graph to make the APIs better.

Before Go 1, the os.FileInfo struct contained these fields:

type FileInfo struct {
    Dev      uint64 // device number
    Ino      uint64 // inode number
    ...
    Atime_ns int64  // access time; ns since epoch
    Mtime_ns int64  // modified time; ns since epoch
    Ctime_ns int64  // change time; ns since epoch
    Name     string // name of file
}

Notice the times Atime_ns, Mtime_ns, Ctime_ns have type int64, an _ns suffix, and are commented as “nanoseconds since epoch.” These fields would clearly be nicer using time.Time, but mistakes in the design of the package structure of the codebase prevented that. To be able to use time.Time here, we refactored the codebase.

This graph shows eight packages from the standard library before Go 1, with an arrow from P to Q indicating that P imports Q.

Import graph before Go 1

Nearly every package has to consider errors, so nearly every package, including package time, imported package os for os.Error. To avoid cycles, anything that imports package os cannot itself be used by package os. As a result, operating system APIs could not use time.Time.

This kind of problem convinced us that os.Error and its constructor os.NewError were so fundamental that they should be moved out of package os. In the end, we moved os.Error into the language as error and os.NewError into the new package errors as errors.New. After this and other refactoring, the import graph in Go 1 looked like:

Import graph for Go 1

Package io and package time had few enough dependencies to be used by package os, and the Go 1 definition of os.FileInfo does use time.Time.

(As a side note, our first idea was to move os.Error and os.NewError to a new package named error (singular) as error.Value and error.New. Feedback from Roger Peppe and others in the Go community helped us see that making the error type predefined in the language would allow its use even in low-level contexts like the specification of run-time panics. Since the type was named error, the package became errors (plural) and the constructor errors.New. Andrew Gerrand’s 2015 talk “How Go was Made” has more detail.)

4. Gradual code repair

The benefits of a codebase refactoring apply throughout the codebase. Unfortunately, so do the costs: often a large number of repairs must be made as a result of the refactoring. As codebases grow, it becomes infeasible to do all the repairs at one time. The repairs must be done gradually, and the programming language must make that possible.

As a simple example, when we moved io.ByteBuffer to bytes.Buffer in 2009, the initial commit moved two files, adjusted three makefiles, and repaired 43 other Go source files. The repairs outweighed the actual API change by a factor of twenty, and the entire codebase was only 250 files. As codebases grow, so does the repair multiplier. Similar changes in large Go codebases, such as Docker, and Juju, and Kubernetes, can have repair multipliers ranging from 10X to 100X. Inside Google we’ve seen repair multipliers well over 1000X.

The conventional wisdom is that when making a codebase-wide API change, the API change and the associated code repairs should be committed together in one big commit:

Atomic Code Repair

The argument in favor of this approach, which we will call “atomic code repair,” is that it is conceptually simple: by updating the API and the code repairs in the same commit, the codebase transitions in one step from the old API to the new API, without ever breaking the codebase. The atomic step avoids the need to plan for a transition during which both old and new API must coexist. In large codebases, however, the conceptual simplicity is quickly outweighed by a practical complexity: the one big commit can be very big. Big commits are hard to prepare, hard to review, and are fundamentally racing against other work in the tree. It’s easy to start doing a conversion, prepare your one big commit, finally get it submitted, and only then find out that another developer added a use of the old API while you were working. There were no merge conflicts, so you missed that use, and despite all your effort the one big commit broke the codebase. As codebases get larger, atomic code repairs become more difficult and more likely to break the codebase inadvertently.

In our experience, an approach that scales better is to plan for a transition period during which the code repair proceeds gradually, across as many commits as needed:

Gradual Code Repair

Typically this means the overall process runs in three stages. First, introduce the new API. The old and new API must be interchangeable, meaning that it must be possible to convert individual uses from the old to the new API without changing the overall behavior of the program, and uses of the old and new APIs must be able to coexist in a single program. Second, across as many commits as you need, convert all the uses of the old API to the new API. Third, remove the old API.

“Gradual code repair” is usually more work than the atomic code repair, but the work itself is easier: you don’t have to get everything right in one try. Also, the individual commits are much smaller, making them easier to review and submit and, if needed, roll back. Maybe most important of all, a gradual code repair works in situations when one big commit would be impossible, for example when code that needs repairs is spread across multiple repositories.

The bytes.Buffer change looks like an atomic code repair, but it wasn’t. Even though the commit updated 43 source files, the commit message says, “left io.ByteBuffer stub around for now, for protocol compiler.” That stub was in a new file named io/xxx.go that read:

// This file defines the type io.ByteBuffer
// so that the protocol compiler's output
// still works. Once the protocol compiler
// gets fixed, this goes away.

package io

import "bytes"

type ByteBuffer struct {
    bytes.Buffer;
}

Back then, just like today, Go was developed in a separate source repository from the rest of Google’s source code. The protocol compiler in Google’s main repository was responsible for generating Go source files from protocol buffer definitions; the generated code used io.ByteBuffer. This stub was enough to keep the generated code working until the protocol compiler could be updated. Then a later commit removed xxx.go.

Even though there were many fixes included in the original commit, this change was still a gradual code repair, not an atomic one, because the old API was only removed in a separate stage after the existing code was converted.

In this specific case the gradual repair did succeed, but the old and new API were not completely interchangeable: if there had been a function taking an *io.ByteBuffer argument and code calling that function with an *io.ByteBuffer, those two pieces of code could not have been updated independently: code that passed an *io.ByteBuffer to a function expecting a *bytes.Buffer, or vice versa, would not compile.

Again, a gradual code repair consists of three stages:

  1. Add new API, interchangeable with old API.
  2. Convert uses of old API to new API.
  3. Remove old API.

These stages apply to a gradual code repair for any API change. In the specific case of codebase refactoring—moving an API from one package to another, changing its full name in the process—making the old and new API interchangeable means making the old and new names interchangeable, so that code using the old name has exactly the same behavior as if it used the new name.

Let’s look at examples of how Go makes that possible (or not).

4.1. Constants

Let’s start with a simple example of moving a constant.

Package io defines the Seeker interface, but the named constants that developers prefer to use when invoking the Seek method came from package os. Go 1.7 moved the constants to package io and gave them more idiomatic names; for example, os.SEEK_SET is now available as io.SeekStart.

For a constant, one name is interchangeable with another when the definitions use the same type and value:

package io
const SeekStart int = 0

package os
const SEEK_SET int = 0

Due to Go 1 compatibility, we’re blocked in stage 2 of this gradual code change. We can’t delete the old constants, but making the new ones available in package io allows developers to avoid importing package os in code that does not actually depend on operating system functionality.

This is also an example of a gradual code repair being done across many repositories. Go 1.7 introduced the new API, and now it’s up to everyone with Go code to update their code as they see fit. There’s no rush, no forced breakage of existing code.

4.2. Functions

Now let’s look at moving a function from one package to another.

As mentioned above, in 2011 we replaced os.Error with the predefined type error and moved the constructor os.NewError to a new package as errors.New.

For a function, one name is interchangeable with another when the definitions use the same signature and implementation. In this case, we can define the old function as a wrapper calling the new function:

package errors
func New(msg string) error { ... }

package os
func NewError(msg string) os.Error {
    return errors.New(msg)
}

Since Go does not allow comparing functions for equality, there is no way to tell these two functions apart. The old and new API are interchangeable, so we can proceed to stages 2 and 3.

(We are ignoring a small detail here: the original os.NewError returned an os.Error, not an error, and two functions with different signatures are distinguishable. To really make these functions indistinguishable, we would also need to make os.Error and error indistinguishable. We will return to that detail in the discussion of types below.)

4.3. Variables

Now let’s look at moving a variable from one package to another.

We are discussing exported package-level API, so the variable in question must be an exported global variable. Such variables are almost always set at init time and then only intended to be read from, never written again, to avoid races between reading and writing goroutines. For exported global variables that follow this pattern, one name is nearly interchangeable with another when the two have the same type and value. The simplest way to arrange that is to initialize one from the other:

package io
var EOF = ...

package os
var EOF = io.EOF

In this example, io.EOF and os.EOF are the same value. The variable values are completely interchangeable.

There is one small problem. Although the variable values are interchangeable, the variable addresses are not. In this example, &io.EOF and &os.EOF are different pointers. However, it is rare to export a read-only variable from a package and expect clients to take its address: it would be better for clients if the package exported a variable set to the address instead, and then the pattern works.

4.4. Types

Finally let’s look at moving a type from one package to another. This is much harder to do in Go today, as the following three examples demonstrate.

4.4.1. Go’s os.Error

Consider once more the conversion from os.Error to error. There’s no way in Go to make two names of types interchangeable. The closest we can come in Go is to give os.Error and error the same underlying definition:

package os
type Error error

Even with this definition, and even though these are interface types, Go still considers these two types different, so that a function returning an os.Error is not the same as a function returning an error. Consider the io.Reader interface:

package io
type Reader interface {
    Read(b []byte) (n int, err error)
}

If io.Reader is defined using error, as above, then a Read method returning os.Error will not satisfy the interface.

If there’s no way to make two names for a type interchangeable, that raises two questions. First, how do we enable a gradual code repair for a moved or renamed type? Second, what did we do for os.Error in 2011?

To answer the second question, we can look at the source control history. It turns out that to aid the conversion, we added a temporary hack to the compiler to make code written using os.Error be interpreted as if it had written error instead.

4.4.2. Kubernetes

This problem with moving types is not limited to fundamental changes like os.Error, nor is it limited to the Go repository. Here’s a change from the Kubernetes project. Kubernetes has a package util, and at some point the developers decided to split out that package’s IntOrString type into its own package intstr.

Applying the pattern for a gradual code repair, the first stage is to establish a way for the two types to be interchangeable. We can’t do that, because the IntOrString type is used in struct fields, and code can’t assign to that field unless the value being assigned has the correct type:

package util
type IntOrString intstr.IntOrString

// Not good enough for:

// IngressBackend describes ...
type IngressBackend struct {
    ServiceName string             `json:"serviceName"`
    ServicePort intstr.IntOrString `json:"servicePort"`
}

If this use were the only problem, then you could imagine writing a getter and setter using the old type and doing a gradual code repair to change all existing code to use the getter and setter, then modifying the field to use the new type and doing a gradual code repair to change all existing code to access the field directly using the new type, then finally deleting the getter and setter that mention the old type. That required two gradual code repairs instead of one, and there are many uses of the type other than this one struct field.

In practice, the only option here is an atomic code repair, or else breaking all code using IntOrString.

4.4.3. Docker

As another example, here’s a change from the Docker project. Docker has a package utils, and at some point the developers decided to split out that package’s JSONError type into a separate jsonmessage package.

Again we have the problem that the old and new types are not interchangeable, but it shows up in a different way, namely type assertions:

package utils
type JSONError jsonmessage.JSONError

// Not good enough for:

jsonError, ok := err.(*jsonmessage.JSONError)
if !ok {
    jsonError = &jsonmessage.JSONError{
        Message: err.Error(),
    }
}

If the error err not already a JSONError, this code wraps it in one, but during a gradual repair, this code handles utils.JSONError and jsonmessage.JSONError differently. The two types are not interchangeable. (A type switch would expose the same problem.)

If this line were the only problem, then you could imagine adding a type assertion for *utils.JSONError, then doing a gradual code repair to remove other uses of utils.JSONError, and finally removing the additional type guard just before removing the old type. But this line is not the only problem. The type is also used elsewhere in the API and has all the problems of the Kubernetes example.

In practice, again the only option here is an atomic code repair or else breaking all code using JSONError.

5. Solutions?

We’ve now seen examples of how we can and cannot move constants, functions, variables, and types from one package to another. The patterns for establishing interchangeable old and new API are:

const OldAPI = NewPackage.API

func OldAPI() { NewPackage.API() }

var OldAPI = NewPackage.API

type OldAPI ... ??? modify compiler or ... ???

For constants and functions, the setup for a gradual code repair is trivial. For variables, the trivial setup is incomplete but only in ways that are not likely to arise often in practice.

For types, there is no way to set up a gradual code repair in essentially any real example. The most common option is to force an atomic code repair, or else to break all code using the moved type and leave clients to fix their code at the next update. In the case of moving os.Error, we resorted to modifying the compiler. None of these options is reasonable. Developers should be able to do refactorings that involve moving a type from one package to another without needing an atomic code repair, without resorting to intermediate code and multiple rounds of repair, without forcing all client packages to update their own code immediately, and without even thinking about modifying the compiler.

But how? What should these refactorings look like tomorrow?

We don’t know. The goal of this article is to define the problem well enough to discuss the possible answers.

5.1. Aliases

As explained above, the fundamental problem with moving types is that while Go provides ways to create an alternate name for a constant or a function or (most of the time) a variable, there is no way to create an alternate name for a type.

For Go 1.8 we experimented with introducing first-class support for these alternate names, called aliases. A new declaration syntax, the alias form, would have provided a uniform way to create an alternate name for any kind of identifier:

const OldAPI => NewPackage.API
func  OldAPI => NewPackage.API
var   OldAPI => NewPackage.API
type  OldAPI => NewPackage.API

Instead of four different mechanisms, the refactoring of package os we considered above would have used a single mechanism:

package os
const SEEK_SET => io.SeekStart
func  NewError => errors.New
var   EOF      => io.EOF
type  Error    => error

During the Go 1.8 release freeze, we found two small but important unresolved technical details in the alias support (issues 17746 and 17784), and we decided that it was not possible to resolve them confidently in the time remaining before the Go 1.8 release, so we held aliases back from Go 1.8.

5.2. Versioning

An obvious question is whether to rely on versioning and dependency management for code repair, instead of focusing on strategies that enable gradual code repair.

Versioning and gradual code repair strategies are complementary. A versioning system’s job is to identify a compatible set of versions of all the packages needed in a program, or else to explain why no such set can be constructed. Gradual code repair creates additional compatible combinations, making it more likely that a versioning system can find a way to build a particular program.

Consider again the various updates to Go’s standard library that we discussed above. Suppose that the old API corresponded in a versioning system to standard library version 5.1.3. In the usual atomic code repair approach, the new API would be introduced and the old API removed at the same time, resulting in version 6.0.0; following semantic versioning, the major version number is incremented to indicate the incompatibility caused by removing the old API.

Now suppose that your larger program depends on two packages, Foo and Bar. Foo still uses the old standard library API. Bar has been updated to use the new standard library API, and there have been important changes since then that your program needs: you can’t use an older version of Bar from before the standard library changes.

Standard Library Incompatibility

There is no compatible set of libraries to build your program: you want the latest version of Bar, which requires standard library 6.0.0, but you also need Foo, which is incompatible with standard library 6.0.0. The best a versioning system can do in this case is report the failure clearly. (If you are sufficiently motivated, you might then resort to updating your own copy of Foo.)

In contrast, with better support for gradual code repair, we can add the new, interchangeable API in version 5.2.0, and then remove the old API in version 6.0.0.

Standard Library Compatibility

The intermediate version 5.2.0 is backwards compatible with 5.1.3, indicated by the shared major version number 5. However, because the change from 5.2.0 to 6.0.0 only removed API, 5.2.0 is also, perhaps surprisingly, backwards compatible with 6.0.0. Assuming that Bar declares its requirements precisely—it is compatible with both 5.2.0 and 6.0.0—a version system can see that both Foo and Bar are compatible with 5.2.0 and use that version of the standard library to build the program.

Good support for and adoption of gradual code repair reduces incompatibility, giving versioning systems a better chance to find a way to build your program.

5.3. Type aliases

To enable gradual code repair during codebase refactorings, it must be possible to create alternate names for a constant, function, variable, or type. Go already allows introducing alternate names for all constants, all functions, and nearly all variables, but no types. Put another way, the general alias form is never necessary for constants, never necessary for functions, only rarely necessary for variables, but always necessary for types.

The relative importance to the specific declarations suggests that perhaps the Go 1.8 aliases were an overgeneralization, and that we should instead focus on a solution limited to types. The obvious solution is type-only aliases, for which no new operator is required. Following Pascal (or, if you prefer, Rust), a Go program could introduce a type alias using the assignment operator:

type OldAPI = NewPackage.API

The idea of limiting aliases to types was raised during the Go 1.8 alias discussion, but it seemed worth trying the more general approach, which we did, unsuccessfully. In retrospect, the fact that = and => have identical meanings for constants while they have nearly identical but subtly different meanings for variables suggests that the general approach is not worth its complications.

In fact, the idea of adding Pascal-style type aliases was considered in the early design of Go, but until now we didn’t have a strong use case for them.

Type aliases seem like a promising approach to explore, but, at least to me, generalized aliases seemed equally promising before the discussion and experimentation during the Go 1.8 cycle. Rather than prejudge the outcome, the goal of this article is to explain the problem in detail and examine a few possible solutions, to enable a productive discussion and evaluation of ideas for next time.

6. Challenge

Go aims to be ideal for large codebases.

In large codebases, it’s important to be able to refactor codebase structure, which means moving APIs between packages and updating client code.

In such large refactorings, it’s important to be able to use a gradual transition from the old API to the new API.

Go does not support the specific case of gradual code repair when moving types between packages at all. It should.

I hope we the Go community can fix this together in Go 1.9. Maybe type aliases are a good starting point. Maybe not. Time will tell.

7. Acknowledgements

Thanks to the many people who helped us think through the design questions that got us this far and led to the alias trial during Go 1.8 development. I look forward to the Go community helping us again when we revisit this problem for Go 1.9. If you’d like to contribute, please see issue 18130.