Description
Original Title: proposal: support gradual code repair while moving a type between packages
Go should add the ability to create alternate equivalent names for types, in order to enable gradual code repair during codebase refactoring. This was the target of the Go 1.8 alias feature, proposed in #16339 but held back from Go 1.8. Because we did not solve the problem for Go 1.8, it remains a problem, and I hope we can solve it for Go 1.9.
In the discussion of the alias proposal, there were many questions about why this ability to create alternate names for types in particular is important. As a fresh attempt to answer those questions, I wrote and posted an article, “Codebase Refactoring (with help from Go).” Please read that article if you have questions about the motivation. (For an alternate, shorter presentation, see Robert's Gophercon lightning talk. Unfortunately, that video wasn't available online until October 9. Update, Dec 16: here's my GothamGo talk, which was essentially the first draft of the article.)
This issue is not proposing a specific solution. Instead, I want to gather feedback from the Go community about the space of possible solutions. One possible avenue is to limit aliases to types, as mentioned at the end of the article. There may be others we should consider as well.
Please post thoughts about type aliases or other solutions as comments here.
Thank you.
Update, Dec 16: Design doc for type aliases posted.
Update, Jan 9: Proposal accepted, dev.typealias repository created, implementation due at the start of the Go 1.9 cycle for experimentation.
Discussion summary (last updated 2017-02-02)
Do we expect to need a general solution that works for all declarations?
If type aliases are 100% necessary, then var aliases are maybe 10% necessary, func aliases are 1% necessary, and const aliases are 0% necessary. Because const already has = and func could plausibly use = too, the key question is whether var aliases are important enough to plan for or implement.
As argued by @rogpeppe (#16339 (comment)) and @ianlancetaylor (#16339 (comment)) in the original alias proposal and as mentioned in the article, a mutating global var is usually a mistake. It probably doesn't make sense to complicate the solution to accommodate what is usually a bug. (In fact, if we can figure out how, it would not surprise me if in the long term Go moves toward requiring global vars to be immutable.)
Because richer var aliases are likely not important enough to plan for, it seems like the right choice here is to focus only on type aliases. Most of the comments here seem to agree. I won't list everyone.
Do we need a new syntax (= vs => vs export)?
The strongest argument for new syntax is the need to support var aliases, either now or in the future (#18130 (comment) by @Merovius). It seems okay to plan not to have var aliases (see previous section).
Without var aliases, reusing = is simpler than introducing new syntax, whether => like in the alias proposal, ~ (#18130 (comment) by @joegrasse), or export (#18130 (comment) by @cznic).
Using = in would also exactly match the syntax of type aliases in Pascal and Rust. To the extent that other languages have the same concepts, it's nice to use the same syntax.
Looking ahead, there could be a future Go in which func aliases exist too (see #18130 (comment) by @nigeltao), and then all declarations would permit the same form:
const C2 = C1
func F2 = F1
type T2 = T1
var V2 = V1
The only one of these that wouldn't establish a true alias would be the var declaration, because V2 and V1 can be redefined independently as the program executes (unlike the const, func, and type declarations which are immutable). Since one main reason for variables is to allow them to vary, that exception would at least be easy to explain. If Go moves toward immutable global vars, then even that exception would disappear.
To be clear, I am not suggesting func aliases or immutable global vars here, just working through the implications of such future additions.
@jimmyfrasche suggested (#18130 (comment)) aliases for everything except consts, so that const would be the exception instead of var:
const C2 = C1 // no => form
func F2 => F1
type T2 => T1
var V2 => V1
var V2 = V1 // different from => form
Having inconsistencies with both const and var seems more difficult to explain than having just an inconsistency for var.
Can this be a tooling- or compiler-only change instead of a language change?
It's certainly worth asking whether gradual code repair can be enabled purely by side information supplied to the compiler (for example, #18130 (comment) by @btracey).
Or maybe if the compiler can apply some kind of rule-based preprocessing to transform input files before compilation (for example, #18130 (comment) by @tux21b).
Unfortunately, no, the change really can't be confined that way. There are at least two compilers (gc and gccgo) that would need to coordinate, but so would any other tools that analyze programs, like go vet, guru, goimports, gocode (code completion), and others.
As @bcmills said (#18130 (comment)), “a ‘non-language-change’ mechanism which must be supported by all implementations is a de facto language change — it’s just one with poorer documentation.”
What other uses might aliases have?
We know of the following. Given that type aliases in particular were deemed important enough for inclusion in Pascal and Rust, there are likely others.
-
Aliases (or just type aliases) would enable creating drop-in replacements that expand other packages. For example see https://go-review.googlesource.com/#/c/32145/, especially the explanation in the commit message.
-
Aliases (or just type aliases) would enable structuring a package with a small API surface but a large implementation as a collection of packages for better internal structure but still present just one package to be imported and used by clients. There's a somewhat abstract example described at Proposal: Alias declarations for Go #16339 (comment).
-
Protocol buffers have an "import public" feature whose semantics is trivial to implement in generated C++ code but impossible to implement in generated Go code. This causes frustration for authors of protocol buffer definitions shared between C++ and Go clients. Type aliases would provide a way for Go to implement this feature. In fact, the original use case for import public was gradual code repair. Similar issues may arise in other kinds of code generators.
-
Abbreviating long names. Local (unexported or not-package-scoped) aliases might be handy to abbreviate a long type name without introducing the overhead of a whole new type. As with all these uses, the clarity of the final code would strongly influence whether this is a suggested use.
What other issues does a proposal for type aliases need to address?
Listing these for reference. Not attempting to solve or discuss them in this section, although a few were discussed later and are summarized in separate sections below.
-
Handling in godoc. (all: support gradual code repair while moving a type between packages #18130 (comment) by @nigeltao and all: support gradual code repair while moving a type between packages #18130 (comment) by @jimmyfrasche)
-
Can methods be defined on types named by alias? (all: support gradual code repair while moving a type between packages #18130 (comment) by @ulikunitz)
-
If aliases to aliases are allowed, how do we handle alias cycles? (all: support gradual code repair while moving a type between packages #18130 (comment) by @thwd)
-
Should aliases be able to export unexported identifiers? (all: support gradual code repair while moving a type between packages #18130 (comment) by @thwd)
-
What happens when you embed an alias (how do you access the embedded field)? (all: support gradual code repair while moving a type between packages #18130 (comment) by @thwd, also spec: embedding a type alias is confusing #17746)
-
Are aliases available as symbols in the built program? (all: support gradual code repair while moving a type between packages #18130 (comment) by @thwd)
-
Ldflags string injection: what if we refer to an alias? (all: support gradual code repair while moving a type between packages #18130 (comment) by @thwd; this only arises if there are var aliases.)
Is versioning a solution by itself?
"In that case maybe versioning is the whole answer, not type aliases."
(#18130 (comment) by @iainmerrick)
As noted in the article, I think versioning is an complementary concern. Support for gradual code repair, such as with type aliases, gives a versioning system more flexibility in how it builds a large program, which can be difference between being able to build the program and not.
Can the larger refactoring problem be solved instead?
In #18130 (comment), @niemeyer points out that there were actually two changes for moving os.Error to error: the name changed but so did the definition (the current Error method used to be a String method).
@niemeyer suggests that perhaps we can find a solution to the broader refactoring problem that fixes types moving between packages as a special case but also handles things like method names changing, and he proposes a solution built around "adapters".
There is a fair amount of discussion in the comments that I can't easily summarize here. The discussion isn't over, but so far it is unclear whether "adapters" can fit into the language or be implemented in practice. It does seem clear that adapters are at least one order of magnitude more complex than type aliases.
Adapters need a coherent solution to the subtyping problems noted below as well.
Can methods be declared on alias types?
Certainly aliases do not allow bypassing the usual method definition restrictions: if a package defines type T1 = otherpkg.T2, it cannot define methods on T1, just as it cannot define methods directly on otherpkg.T2. That is, if type T1 = otherpkg.T2, then func (T1) M() is equivalent to func (otherpkg.T2) M(), which is invalid today and remains invalid. However, if a package defines type T1 = T2 (both in the same package), then the answer is less clear. In this case, func (T1) M() would be equivalent to func (T2) M(); since the latter is allowed, there is an argument to allow the former. The current design doc does not impose a restriction here (in keeping with the general avoidance of restrictions), so that func (T1) M() is valid in this situation.
In #18130 (comment), @jimmyfrasche suggests that instead defining "no use of aliases in method definitions" would be a clear rule and avoid needing to know what T is defined as to know if func (T) M() is valid. In #18130 (comment), @rsc points out that even today there are certain T for which func (T) M() is not valid: https://play.golang.org/p/bci2qnldej. In practice this doesn't come up because people write reasonable code.
We will keep this possible restriction in mind but wait until there is strong evidence that it is needed before introducing it.
Is there a cleaner way to handle embedding and, more generally, field renames?
In #18130 (comment), @Merovius points out that an embedded type that changes its name during a package move will cause problems when that new name must eventually be adopted at the use sites. For example if user type U has an embedded io.ByteBuffer that moves to bytes.Buffer, then while U embeds io.ByteBuffer the field name is U.ByteBuffer, but when U is updated to refer to bytes.Buffer, the field name necessarily changes to U.Buffer.
In #18130 (comment), @neild points out that there is at least a workaround if references to io.ByteBuffer must be excised: the package P that defines U can also define 'type ByteBuffer = bytes.Buffer' and embed that type into U. Then U still has a U.ByteBuffer, even after io.ByteBuffer is gone entirely.
In #18130 (comment), @bcmills suggests the idea of field aliases, to allow a field to have multiple names during a gradual repair. Field aliases would allow defining something like type U struct { bytes.Buffer; ByteBuffer = Buffer }
instead of having to create the top-level type alias.
In #18130 (comment), @rsc raises yet another possibility: some syntax for 'embed this type with this name', so that it is possible to embed a bytes.Buffer as the field name ByteBuffer, without needing a top-level type or an alternate name. If that existed, then the type name could be updated from io.ByteBuffer to bytes.Buffer while preserving the original name (and not introducing a second, nor a clumsy exported type).
These all seem worth exploring once we have more evidence of large-scale refactorings blocked by problems with fields changing names. As @rsc wrote, "If type aliases help us get to the point where lack of field aliases is the next big roadblock for large-scale refactorings, that will be progress!"
There was a suggestion of restricting the use of aliases in embedded fields or changing the embedded name to use the target type's name, but those make the alias introduction break existing definitions that must then be fixed atomically, essentially preventing any gradual repair. @rsc: "We discussed this at some length in #17746. I was originally on the side of the name of an embedded io.ByteBuffer alias being Buffer, but the above argument convinced me I was wrong. @jimmyfrasche in particular made some good arguments about the code not changing depending on the definition of the embedded thing. I don't think it's tenable to disallow embedded aliases completely."
What is the effect on programs using reflection?
Programs using reflection see through aliases. In #18130 (comment), @atdiar points out that if a program is using reflection to, for example, find the package in which a type is defined or even the name of a type, it will observe the change when the type is moved, even if a forwarding alias is left behind. In #18130 (comment), @rsc confirmed this and wrote "Like the situation with embedding, it's not perfect. Unlike the situation with embedding, I don't have any answers except maybe code shouldn't be written using reflect to be quite that sensitive to those details."
The use of vendored packages today also changes package import paths seen by reflect, and we have not been made aware of significant problems caused by that ambiguity. This suggests that programs are not commonly inspecting reflect.Type.PkgPath in ways that would be broken by use of aliases. Even so, it's a potential gap, just like embedding.
What is the effect on separate compilation of programs and plugins?
In #18130 (comment), @atdiar raises the question of the effect on object files and separate compilation. In #18130 (comment), @rsc replies that there should be no need to make changes here: if X imports Y and Y changes and is recompiled, then X needs to be recompiled too. That's true today without aliases, and it will remain true with aliases. Separate compilation means being able to compile X and Y in distinct steps (the compiler does not have to process them in the same invocation), not that it is possible to change Y without recompiling X.
Would sum types or some kind of subtyping be an alternative solution?
In #18130 (comment), @iand suggests "substitutable types", "a list of types that may be substituted for the named type in function arguments, return values etc.". In #18130 (comment), @j7b suggests using algebraic types "so we also get an empty interface equivalent with compile time type checking as a bonus". Other names for this concept are sum types and variant types.
In general this does not suffice to allow moving types with gradual code repair. There are two ways to think about this.
In #18130 (comment), @bcmills takes the concrete way, pointing out that algebraic types have a different representation than the original, which makes it not possible to treat the sum and the original as interchangeable: the latter has type tags.
In #18130 (comment), @rsc takes the theoretical way, expanding on #18130 (comment) by @gri pointing out that in a gradual code repair, sometimes you need T1 to be a subtype of T2 and sometimes vice versa. The only way for both to be subtypes of each other is for them to be the same type, which not concidentally is what type aliases do.
As a side tangent, in addition to not solving the gradual code repair problem, algebraic types / sum types / union types / variant types are by themselves hard to add to Go. See
the FAQ answer and the Go 1.6 AMA discussion for more.
In #18130 (comment), @thwd suggests that since Go has a subtyping relationship between concrete types and interfaces (bytes.Buffer can be seen as a subtype of io.Reader) and between interfaces (io.ReadWriter is a subtype of io.Reader in the same way), making interfaces "recursively covariant (according to the current variance rules) down to their method arguments" would solve the problem provided that all future packages only use interfaces, never concrete types like structs ("encourages good design, too").
There are three problems with that as a solution. First, it has the subtyping issues above, so it doesn't solve gradual code repair. Second, it doesn't apply to existing code, as @thwd noted in this suggestion. Third, forcing the use of interfaces everywhere may not actually be good design and introduces performance overheads (see for example #18130 (comment) by @Merovius and #18130 (comment) by @zombiezen).
Restrictions
This section collects proposed restrictions for reference, but keep in mind that restrictions add complexity. As I wrote in #18130 (comment), "we should probably only implement those restrictions after actual experience with the unrestricted, simpler design helps us understand whether the restriction would bring enough benefits to pay for its cost."
Put another way, any restriction would need to be justified by evidence that it would prevent some serious misuse or confusion. Since we haven't implemented a solution yet, there is no such evidence. If experience did provide that evidence, these will be worth returning to.
Restriction? Aliases of standard library types can only be declared in standard library.
(#18130 (comment) and #18130 (comment) by @iand)
The concern is "code that has renamed standard library concepts to fit a custom naming convention", or "long spaghetti chains of aliases across multiple packages that end up back at the standard library", or "aliasing things like interface{} and error".
As stated, the restriction would disallow the "extension package" case described above involving x/image/draw.
It's unclear why the standard library should be special: the problems would exist with any code. Also, neither interface{} nor error is a type from the standard library. Rephrasing the restriction as "aliasing predefined types" would disallow aliasing error, but the need to alias error was one of the motivating examples in the article.
Restriction? Alias target must be package-qualified identifier.
(#18130 (comment) by @jba)
This would make it impossible to make an alias when renaming a type within a package, which may be used widely enough to necessitate a gradual repair (#18130 (comment) by @bcmills).
It would also disallow aliasing error as in the article.
Restriction? Alias target must be package-qualified identifier with same name as alias.
(proposed during alias discussion in Go 1.8)
In addition to the problems of the previous section with limiting to package-qualified identifiers, forcing the name to stay the same would disallow the conversion from io.ByteBuffer to bytes.Buffer in the article.
Restriction? Aliases should be discouraged in some way.
"How about hiding aliases behind an import, just like for "C" and “unsafe”, to further discourage it's usage? In the same vein, I would like the aliases syntax to be verbose and stand out as a scaffold for on going refactoring." - #18130 (comment) by @xiegeo
"Should we also automatically infer that an aliased type is legacy and should be replaced by the new type? If we enforce golint, godoc and similar tools to visualize the old type as deprecated, it would limit the abuse of type aliasing very significantly. And the final concern of aliasing feature being abused would be resolved." - #18130 (comment) by @rakyll
Until we know that they will be used wrong, it seems premature to discourage usage. There may be good, non-temporary uses (see above).
Even in the case of code repair, either the old or new type may be the alias during the transition, depending on the constraints imposed by the import graph. Being an alias does not mean the name is deprecated.
There is already a mechanism for marking certain declarations as deprecated (see #18130 (comment) by @jimmyfrasche).
Restriction? Aliases must target named types.
"Aliases shouldn't not apply to unnamed type. Their is no "code repair" story in moving from one unnamed type to another. Allowing aliases on unnamed types means I can no longer teach Go as simply named and unnamed types." - #18130 (comment) by @davecheney
Until we know that they will be used wrong, it seems premature to discourage usage. There may be good uses with unnamed targets (see above).
As noted in the design doc, we do expect to change the terminology to make the situation clearer.
Activity
variadico commentedon Dec 1, 2016
I like how visually uniform this looks.
But since we can almost gradually move most elements, maybe the simplest
solution is just to allow an
=
for types.zquestz commentedon Dec 1, 2016
So first, I just wanted to thank you for that excellent write-up. I think the best solution is to introduce type aliases with an assignment operator. This requires no new keywords/operators, uses a familiar syntax, and should solve the refactoring problem for large code bases.
iand commentedon Dec 1, 2016
As Russ's article points out, any alias-like solution needs to gracefully solve #17746 and #17784
travisjeffery commentedon Dec 1, 2016
Thank you for the write up of that article.
I find the type-only aliases using the assignment operator to be best:
My reasons:
The alternative solution => having subtly different meaning based on its operand feels out of place for Go.
The issue at hand with types is solved and you don't need to worry about imagining the complications of the generalized solution.
I think it looks more pleasing.
All of these above: the result being simple, focused, conservative, and aesthetic make it easy for me to picture of it being a part of Go.
cznic commentedon Dec 1, 2016
If the solution would be limited to types only then the syntax
already considered before, as discussed in the @rsc's article, looks very good to me.
If we would like to be able to do the same for constants, variables and functions, my preferred syntax would be (as proposed before)
As discussed before, the disadvantage is that a new, top-level only keyword is introduced, which is admittedly akward, even though technically feasible and fully backwards compatible. I like this syntax because it reflects the pattern of imports. It would seem natural to me that exports would be permitted only in the same section where imports are allowed, ie. between the package clause and any var, type, constant or function TLD.
The renaming identifiers would be declared in the package scope, however, the new names are not visible in the package declaring them (newfmt in the example above) above with respect to redeclaration, which is disallowed as usual. Given the previous example, TLDs
In the importing package the renaming identifiers are visible normally, as any other exported identifier of the (newftm's) package block.
In conclusion, this approach does not introduce any new local name binding in newfmt, which I believe avoids at least some of the problems discussed in #17746 and solves #17784 completely.
4ad commentedon Dec 1, 2016
My first preference is for a type-only
type NewFoo = old.Foo
.If a more general solution is desired, I agree with @cznic that a dedicated keyword is better than a new operator (especially an asymetric operator with confusing directionality[1]). That being said, I don't think the
export
keyword conveys the right meaning. Neither the syntax, nor semantics mirrorsimport
. What aboutalias
?I understand why @cznic doesn't want the new names to be accesible in the package declaring them, but, to me at least, that restriction feels unexpected and artificial (although I perfectly well understand the reason behind it).
[1] I have been using Unix for almost 20 years, and I still can't create a symlink on the first try. And I usually fail even on the second try, after I have read the manual.
iand commentedon Dec 1, 2016
I would like to propose an additional constraint: type aliases to standard library types may only be declared in the standard library.
My reasoning is that I don't want to work with code that has renamed standard library concepts to fit a custom naming convention. I also don't want to deal with long spaghetti chains of aliases across multiple packages that end up back at the standard library.
quentinmit commentedon Dec 1, 2016
@iand: That constraint would block the use of this feature to migrate anything into the standard library. Case in point, the current migration of
Context
into the standard library. The old home ofContext
should become an alias for theContext
in the standard library.iand commentedon Dec 1, 2016
@quentinmit that is unfortunately true. It also limits the use case for golang.org/x/image/draw in this CL https://go-review.googlesource.com/#/c/32145/
My real concern is with people aliasing things like
interface{}
anderror
joegrasse commentedon Dec 1, 2016
If it is decided to introduce a new operator, I would like to propose
~
. In the English language, it is generally understood to mean "similar to", "approximately", "about", or "around". As @4ad above stated, the=>
is an asymetric operator with confusing directionality.For example:
jba commentedon Dec 1, 2016
@iand if we limit the right-hand side to a package-qualified identifier, then that would eliminate your specific concern.
It would also mean you couldn't have aliases to any types in the current package, or to long type expressions like
map[string]map[int]interface{}
. But those uses have nothing to do with the main goal of gradual code repair, so maybe they are no great loss.rsc commentedon Dec 1, 2016
@cznic, @iand, others: Please note that restrictions add complexity. They complicate the explanation of the feature, and they add cognitive load for any user of the feature: if you forget about a restriction, you have to puzzle through why something you thought should work doesn't.
It's often a mistake to implement restrictions on a trial of a design solely due to hypothetical misuse. That happened in the alias proposal discussions, and it made the aliases in the trial unable to handle the
io.ByteBuffer
=>bytes.Buffer
conversion from the article. Part of the goal of writing the article is to define some cases we know we want to be able to handle, so that we don't inadvertently restrict them away.As another example, it would be easy to make a misuse argument to disallow non-pointer receivers, or to disallow methods on non-struct types. If we'd done either of those, you couldn't create enums with String() methods for printing themselves, and you couldn't have
http.Headers
both be a plain map and provide helper methods. It's often easy to imagine misuses; compelling positive uses can take longer to appear, and it's important to create space for experimentation.As yet another example, the original design and implementation for pointer vs value methods did not distinguish between the method sets on T and *T: if you had a *T, you could call the value methods (receiver T), and if you had a T, you could call the pointer methods (receiver *T). This was simple, with no restrictions to explain. But then actual experience showed us that allowing pointer method calls on values led to a specific class of confusing, surprising bugs. For example, you could write:
and io.Copy would succeed, but buf would have nothing in it. We had to choose between explaining why that program ran incorrectly or explaining why that program didn't compile. Either way there were going to be questions, but we came down on the side of avoiding incorrect execution. Even so, we still had to write a FAQ entry about why the design has a hole cut out of it.
Again, please remember that restrictions add complexity. Like all complexity, restrictions need significant justification. At this stage in the design process it is good to think about restrictions that might be appropriate for a particular design, but we should probably only implement those restrictions after actual experience with the unrestricted, simpler design helps us understand whether the restriction would bring enough benefits to pay for its cost.
264 remaining items