proposal: spec: add sum types for algebraic data types

Proposal Details

Algebraic data types, comprised of product and sum types, are alone sufficient to express any bit sequence, and therefore all serialized data. Go already has product and sum types, but whereas its anonymous product types (struct types) are generalized for users to use, its sum types are not. Instead, Go has a fixed set of predeclared sum types, such as numbers and maps, with special semantics, that users must make do with. It is possible to map own's own sum types to integers, for example, but doing so is cumbersome, error-prone, and not type-safe, and having special semantics increases the size of the language and decreases the orthogonality of its features.

I propose that Go adds anonymous sum types, and changes predeclared types to be in terms of anonymous sum types.

Definitions

A struct type (struct in Go) represents a sequence of zero or more named types called fields. A struct type's bits are its first field's bits, followed by its second field's bits, and so on. A struct type's zero value is the zero value of all its fields.

A sum type represents a selection of one out of a sequence of one or more named types called fields. A sum type's bits are the smallest number of bits that represent the unsigned integer that indicates the selected field (called the field index) (the first field's index is 0), followed by the bits of the selected field's value. A sum type's zero value is the zero value of the type of its first field.

I will call anonymous sum types and the types they underly generalized sum types, as opposed to predeclared sum types as they are today.

Syntax and semantics

A bit can be represented as a sum type with two zero-size field types:

type Bit sum {
    Zero struct{}
    One struct{}
}

Bit has two fields, so its field index size is always 1 bit. Since both of its fields have zero size, the total size of Bit is always 1 bit.

The syntax and semantics of sum types are like that of struct types, except only one field can be assigned at a time, reading an unassigned field is a runtime error, and fields are not addressable. Field assignment can be tested with comma-ok/two-value assignments, like map keys.

var b Bit
ok := b == Bit{} // True
ok = b == Bit{Zero: struct{}{}} // True
b = Bit{Zero: struct{}{}}
v0 := b.Zero
v0, ok = b.Zero // True
v1 := b.One // Panic
v1, ok = b.One // False
b = Bit{One: struct{}{}}
v0 = b.Zero // Panic
v0, ok = b.Zero // False
v1 = b.One
v1, ok = b.One // True
b.Zero = struct{}{} // Selects field Zero; field One has no value
b.One = struct{}{} // Selects field One; field Zero has no value
_ = &b.Zero // Illegal
switch b {
case Bit{Zero: struct{}{}}: // Doesn't match
case Bit{One: struct{}{}}: // Does match
}

Conditional execution based on structural pattern matching is done by switch-match statements. They must handle every field for struct and sum types unless a default case is used. There cannot be equivalent patterns in the same switch-match statement.

type Engineer struct {
    Name string
    Specialty string
}
type Manager struct {
    Name string
    Reports int
}
type Employee sum {
    Engineer Engineer
    Manager Manager
}

// Struct type
var eng Engineer
switch v := eng.(match) {
case Engineer{}: // Matches all values
    // v is of type Engineer
case Engineer{_, _}: // Matches all values
case Engineer{name, specialty}: // Matches all values
case Engineer{name, "Go"}: // Matches all Gophers
case Engineer{Name: "John"}: // Matches all Johns
case Engineer{"John", "Go"}: // Matches only Engineer{Name: "John", Specialty: "Go"}
default: // Matches anything else
    // v is of type Engineer
}
switch eng.(match) {
case {}: // Type name can be omitted
case {_, _}:
case {"John", "Go"}:
case {name, "Go"}:
case {name, specialty}:
case {Name: "John"}:
case {"John", "Go"}
}

// Sum type
var emp Employee
switch v := emp.(match) { // v's type is the assigned field's type if there is a specific match
case Employee{Manager: Manager{}}: // Doesn't match
    fmt.Println(v.Reports)
default: // Does match
    fmt.Println(v) // v is the same as emp because there is no specific match
}
switch emp.(match) {
case {Engineer: Engineer{}}: // Same as Employee{Engineer: Engineer{}}
case {Manager: Manager{}}:
}
switch emp.(match) {
case {Engineer: {}}: // Same as {Engineer: Engineer{}}
case {Manager: {}}: // Type name can be omitted from prefix of struct type literal
}
switch emp.(match) {
case {Engineer}: // Same as {Engineer: {}}
case {Manager}:
}
switch emp.(match) {
case {Engineer: {name, specialty}}: // Struct type sub-pattern ({name, specialty}) inside sum type pattern
case {Manager: {name, 123}}: // Sum type sub-pattern (123) inside struct type sub-pattern ({name, 123}) inside sum type pattern
}

Enums are a special case of sum type where all field types are zero-size and are therefore immutable and can therefore be omitted:

type Bool enum {
    False
    True
}

b := Bool{False}
b = Bool{True}
ok := b == Bool{True} // True
b.False = struct{}{} // Illegal
b.True = struct{}{} // Illegal
switch b {
case Bool{False}: // Doesn't match
case Bool{True}: // Does match
}

The size of Bool, like Bit, is always 1 bit.

Expressiveness

Using only struct and sum types without special syntax or semantics, we can represent various generally useful types not specific to Go.

An array type of size N and element type E is a struct type with N fields of type E:

// Pseudocode
type Array0[E any] struct{} // == [0]E
type Array1[E any] struct{Elem1 E} // == [1]E
type Array2[E any] struct{Elem1, Elem2 E} // == [2]E
type Array3[E any] struct{Elem1, Elem2, Elem3 E} // == [3]E
...
type ArrayN[E any] struct{Elem1, Elem2, ..., ElemN E} // == [N]E

A list type with dynamic length N and element type E is a sum type that enumerates all array lengths for element type E, where the field index bits represent the length of the array:

// Pseudocode
type List[E any] sum {
    Length0 struct{}
    Length1 struct{Elem1 E}
    Length2 struct{Elem1, Elem2 E}
    Length3 struct{Elem1, Elem2, Elem3 E}
    ...
}

A map type with key type K and value type V is a list of key-value pairs:

type Map[K comparable, V any] List[struct{
    Key K
    Value V
}]

An optional type with element type T is a sum type with two fields, one with the element type and one without:

type Optional[T any] sum {
    No struct{}
    Yes T
}

Predeclared types

Predeclared Go types can be redefined in terms of generalized sum types.

For example, predeclared number types can be defined in terms of arrays of Bits (arrays being defined as above). An array of 8 Bits has a size of 8 bits, and so on.

type Int8 [8]Bit // Pseudocode: Array8[Bit]
type Int16 [16]Bit
type Int32 [32]Bit
type Int64 [64]Bit

For these types, operations are syntactic sugar for method calls, method calls are inlined where possible, and number literals are syntactic sugar for the corresponding sum type zero value.

func (Int8) Add(Int8) Int8 // Implemented in assembly
func (Int8) Sub(Int8) Int8
...

var x Int8 = 1 // `1` is syntactic sugar for Int8{Bit{}, Bit{}, Bit{}, Bit{}, Bit{}, Bit{}, Bit{}, Bit{One: struct{}{}}}
var _ = x.Add(x) // Function call is inlined
var _ = x + x // Syntactic sugar for x.Add(x), which is inlined
var _ func(Int8) Int8 = x.Add // x.Add is a normal function
var _ interface { Add(Int8) Int8 } = x // Int8.Add is a normal method

Every other predeclared type can be redefined this way too:

type Channel[...] sum {
    Nil struct{}
    notNil ...
}

type Map[...] sum {
    Nil struct{}
    notNil ...
}

type Pointer[...] sum {
    Nil struct{}
    notNil ...
}

type Slice[...] sum {
    Nil struct{}
    notNil ...
}

where operations are syntactic sugar for method calls, method calls are inlined where possible, and nil literals are syntactic sugar for the corresponding sum type zero value.

var b Bool
var c Channel[Int8] // Zero value is Channel[Int8]{Nil: struct{}{}}
var m Map[Int8, Int8] // Zero value is Map[Int8, Int8]{Nil: struct{}{}}
var p Pointer[Int8] // Zero value is Pointer[Int8]{Nil: struct{}{}}
var s Slice[Int8] // Zero value is Slice[Int8]{Nil: struct{}{}}

_ = b == false // Equivalent to: b == Bool{False}
_ = c == nil // Equivalent to: c == Channel[Int8]{Nil: struct{}{}}
_ = m == nil // Equivalent to: m == Map[Int8, Int8]{Nil: struct{}{}}
_ = p == nil // Equivalent to: p == Pointer[Int8]{Nil: struct{}{}}
_ = s == nil // Equivalent to: s == Slice[Int8]{Nil: struct{}{}}

func (Channel[T]) Send(T) // Implemented in assembly
func (Channel[T]) Recv() (T, bool)
...

c.Send(b) // Function call is inlined
c <- b // Syntactic sugar for c.Send(b), which is inlined
var _ func(Bool) = c.Send // c.Send is a normal function
var _ interface { Send(Bool) } = c // Channel[Bool].Send is a normal method

Turning operations into methods means that you can assign predeclared types to non-empty, normal interfaces, which means you can treat them just like you do user-defined types. For example, you do not need two generic functions to handle predeclared and user-defined types separately, as is the case with slices.Sort and slices.SortFunc and many other functions in that package and elsewhere.

Quality of life improvements

Since dealing with zero values is common with sum types, there is a shorthand for it:

var _ Bit = {}
var _ Bool = {}
var _ Int8 = {}
var _ Map[K, V] = {}
var _ Pointer[T] = {}
var _ Slice[T] = {}
var _ struct{} = {}
var _ interface{} = {}

Since dealing with struct types of zero size is common with sum types, there is a shorthand for it:

type unit = struct{} // Universe block

type Optional[T any] sum {
    No unit
    Yes T
}

(The unit type has only one value, which is the case for struct{}.)

Remarks

By default, types are encoded such that field bits and field-index bits are padded as appropriate to align them for memory. Arrays are encoded without padding. However, types don't have to be padded in all cases, which can be useful when reading serialized data from files.

Note that there are multiple ways to achieve the same bit sequence:

type Int2 [2]Bit

type Int2 enum {
    Zero
    One
    Two
    Three
}

Note that adding a field to a sum type breaks switch-match statements that handle it without a default case. This is intentional and is beneficial: it forces you to handle the new field in all the places you handle the sum type. Imagine adding a new case (a new number) to an Int8 type that can already represent 0–255 and not updating all the code that handles the type; it makes little sense. If you want to add cases to your code without worrying about updating the handling code, use an interface instead. Sum types are probably best for small scopes like Optional or well-understood domains like Int8, not cases where you're throwing in new fields haphazardly.

That being said, sometimes sum types must evolve, such as a sum type that represents the Go grammar, and sometimes programs would remain valid even though they do not handle the new fields, in which case using a default case in a switch-match statement to avoid a break is appropriate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

proposal: spec: add sum types for algebraic data types #76949

Proposal Details

Definitions

Syntax and semantics

Expressiveness

Predeclared types

Quality of life improvements

Remarks

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

proposal: spec: add sum types for algebraic data types #76949

Description

Proposal Details

Definitions

Syntax and semantics

Expressiveness

Predeclared types

Quality of life improvements

Remarks

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions