Sum types, 2024 variant #548

Araq · 2024-01-08T11:02:48Z

Sum types, 2024 variant

There is a new type construct, case that gives Nim sum types, comparable to ML's, Rust's, etc.

type
  Option[T] = case
    of None: discard
    of Some: T

  Either[A, B] = case
    of Le: A
    of Ri: T

  BinaryNode = object
    a, b: ref Node
  UnaryNode = object
    a: ref Node

  Node = case
    of BinaryOpr: BinaryNode
    of UnaryOpr: UnaryNode
    of Variable: string
    of Value: int

Constructing a case branch uses the branch name plus its payload in parenthesis. However, BinaryOpr(BinaryNode(a: x, b: y))
can be shortened to BinaryOpr(a: x, b: y), an analogous shortcut exists for tuples.

To access the attached values, pattern matching must be used. This enforces correct access at compile-time.

Access via `as`

The syntax of Branch as x can be used to unpack the sum type to x.

proc traverse(n: ref Node) =
  case n[]
  of BinaryOpr as x:
    traverse x.a
    traverse x.b

  of UnaryOpr as x:
    traverse x.a

  of Variable as name:
    echo name
  of Value as x:
    counter += x

of Branch as variable is the basic form. For of Branch: T the variable has the type T or var T depending on the mutability of the expression x in case x.

Pattern matching

A variable of type T might be inconvienent so there also pattern matching combined with unpacking:

proc traverse(n: ref Node) =
  case n[]
  of BinaryOpr(let a, let b):
    traverse a
    traverse b

  of UnaryOpr(let a):
    traverse a

  of Variable as name:
    echo name
  of Value as x:
    counter += x

These new syntaxes of Branch as x and of Branch(let x) can later be naturally extended to if statements: if n of BinaryOpr as x or if n of Some(var n).

More complex pattern matching

Proposed syntax:

case n
of BinaryOpr(var a, UnaryOpr(let b)) if a == b:
  a = ... # can write-through
of BinaryOpr(_, let b):
  # _ can be used to ignore a value

Serialization

There are two new macros that can traverse sum types:

constructCase takes in a type T and an expression in order to construct a case type T.
unpackCase takes in a value of a case type and an expression in order to traverse the data structure.

For example:

type
  BinaryNode = object
    a, b: ref Node
  UnaryNode = object
    a: ref Node

  Node = case
    of BinaryOpr: BinaryNode
    of UnaryOpr: UnaryNode
    of Variable: string
    of Value: int

proc store(f: var File; x: int) = f.write x # atom
proc store(f: var File; r: ref Node) = store r[] # deref `ref`
proc store[T: object](f: var File; x: T) =
  # known Nim pattern:
  for y in fields(x): store y

proc store[T: case](f: var File; x: T) =
  unpack x, f.store, f.store
  # `unpack` is expanded into a case statement:
  # `case x[]
  # of Val as val: f.store(kind); f.store(val)`
  # ...


proc load[T: case](f: var File; x: typedesc[T]): T =
  let tmp = f.load[:IntegralType(T)]()
  result = constructCase(T, tmp, f.load)

  # constructCase is expanded into a case statement:
  # `case tmp
  # of Val: Val(f.load[:BranchType]())`
  # ...

Anon object types

Later we can add more sugar so that the definition can be simplified to:

type
  Node = case
    of BinaryOpr: object
      a, b: ref Node
    of UnaryOpr: object
      a: ref Node
    of Variable: string
    of Value: int

This way the simplicity is kept that every branch is tied to exactly one type which makes iteration over case types in a generic context much easier.

The text was updated successfully, but these errors were encountered:

metagn · 2024-01-08T13:03:43Z

Some notes with regards to the implementation (feel free to ignore if not interested):

This design is about as simple as it gets for the parser (parse case differently in type contexts), type graph (other proposals expected of Some(T), of Some: T, of Some: field: T to all work), and the initial deconstruction frontend but as mentioned in #527 (comment) there is some work to do for the construction frontend.

None, BinaryOpr etc. would probably best be a new symbol kind skCase that can also undergo generic overloading for the compiler to deal with it in call syntax. We could implement this from scratch but sigmatch has the logic for it for routines, meaning we should probably do the refactor that generalizes sigmatch to include stuff like type conversions. Also sigmatch could take the expected return type into account, i.e.:

type
  Option[T] = case
    of None: discard
    of Some: T
  List[T] = case
    of None: discard
    of Cons: (T, ref List[T])

let x: Option[int] = None()

should work.

arnetheduck · 2024-01-08T13:28:43Z

The construction syntax I'm interested in to get comparable ergonomics of use with other languages (for nim-result):

let x: Either[int, string] = Le(42)

ie Le(42) does not need to know about string.

ee7 · 2024-01-08T14:23:34Z

Simple pattern matching

The syntax of Branch as x can be used to unpack the sum type to x.
proc traverse(n: ref Node) =
  case n[]
  of BinaryOpr as x:
    traverse x.a
    traverse x.b
  # ...
of Branch as variable is sugar for of Branch(let variable). of Branch(var variable) is also available allowing mutations to variable to write through to the underlying enum object.

Given that of Branch as x is available, any thoughts on of Branch as var x as sugar for the other simple form?

Araq · 2024-01-08T14:47:22Z

Given that of Branch as x is available, any thoughts on of Branch as var x as sugar for the other simple form?

Maybe even Branch as x should not be done and only Branch(let x) should be available. We don't have a good history of providing more than one syntax and letting people choose, it always ends up in mild fights in the style guides.

Not applicable anymore as there is an inherent difference between as x and var x, the x has a different type.

SirOlaf · 2024-01-08T18:11:58Z

The construction syntax I'm interested in to get comparable ergonomics of use with other languages (for nim-result):
let x: Either[int, string] = Le(42)
ie Le(42) does not need to know about string.

The devel compiler has {.experimental: "inferGenericTypes".} which lets you do that for functions already, hopefully a similar mechanism can be applied here.

ASVIEST · 2024-01-08T18:28:09Z

I think this will be better for pattern matching

case n
of BinaryOpr(a: var le, b: UnaryOpr(a: let ri)) if le == ri:
  le = ... # can write-through

It's more like an object constructor
and it allows to skip sth like:

case n
of BinaryOpr(b: UnaryOpr(a: let ri)):
  echo ri # I dont care about left operand in BinaryOpr

Araq · 2024-01-08T19:07:13Z

@ASVIEST "don't cares" should be written as _. I think the syntax without field: is preferable as it's shorter and still allows for everything.

ASVIEST · 2024-01-08T19:30:06Z

@ASVIEST "don't cares" should be written as _. I think the syntax without field: is preferable as it's shorter and still allows for everything.

this may be inconvenient for objects with a fairly large number of fields. Maybe we can allow both of these syntaxes like #517 (or maybe better #418) ?

type
  SampleChunk = object
    chunkID: uint32
    size: uint32
    manufacturer: uint32
    product: uint32
    samplePeriod: uint32
    unityNote: uint32
    pitchFraction: uint32
    smpteFmt: uint32
    smpteOffset: uint32
    loopCnt: uint32
    dataSize: uint32
    sampleLoops: seq[SampleLoop]
    data: seq[byte]
  
  SampleLoop = object
    id: uint32
    typ: uint32
    start: uint32
    endBlock: uint32
    frac: uint32
    playbacks: uint32

  WaveChunk = case
    of Sample: SampleChunk
    ...

var x = Sample()
case x
of Sample(_, _, _, _, _, _, _, _, _, _, let dataSize, _, _):
  echo dataSize
else: discard
# vs
case x
of Sample(dataSize: let dataSize):
  echo dataSize
else: discard

allowing both syntaxes it will be convenient both for objects with a large number of fields and for objects with a small number
BTW, apparently I’m not the only one who thought that fieldX: let x was also needed: #537 (comment)

Araq · 2024-01-08T21:00:10Z

It's just:

case x
of Sample(let x):
  echo x.dataSize
else: discard

To select a single field there is no reason to unpack stuff. Keep things simple.

ASVIEST · 2024-01-08T21:09:12Z

It's just:
case x
of Sample(let x):
  echo x.dataSize
else: discard
To select a single field there is no reason to unpack stuff. Keep things simple.

It means that in

case n
of BinaryOpr(var a, UnaryOpr(let b)) if a == b:
  a = ... # can write-through

let b is UnaryOpr and not ref Node ?
var a is ref Node
what ?

ASVIEST · 2024-01-08T21:15:51Z

Maybe you mean this ?

case x
of Sample():
  echo x.dataSize
else: discard

Araq · 2024-01-09T09:19:16Z

Sorry, I made a mistake. Please re-read the proposal. :-/

Araq · 2024-01-09T19:23:22Z

I mean this:

case x
of Sample as y:
  echo y.dataSize
else: discard

Varriount · 2024-01-12T20:58:20Z

Positional pattern matching should only be allowed for tuple and enumeration types, as these types already have prominent features that rely on the ordering of their fields. In contrast, object types do not¹. Allowing positional pattern matching for object types would result in their public field ordering forming part of their API; developers would not be able to move fields around, nor insert new fields before existing fields, nor remove existing fields not at the end of the object, without introducing compatibility-breaking changes.

Considering that only a small subset of objects have a conceptually "natural" field ordering, introducing such behavior doesn't make sense. For objects that do have a conceptually significant field ordering, a tuple type is more appropriate.

Though they do have minor features that rely on field ordering. ↩

ASVIEST · 2024-01-14T20:34:05Z

Positional pattern matching should only be allowed for tuple and enumeration types, as these types already have prominent features that rely on the ordering of their fields. In contrast, object types do not

In general, I agree, but I think that instead of the complete absence of a positional pattern matching for objects, it should be left for objects with a new pragma {.positional.}

Araq · 2024-01-14T23:23:21Z

Can we focus on the question of whether Branch as x vs Branch(let x) with distinct meanings is a good design? Can it be avoided? Did anybody even notice?

metagn · 2024-01-15T00:12:43Z

They could be merged into (assuming Branch(let x) means "unpack the first field of the branch type to x" and not "unpack the full branch to x"):

of Branch as BranchType(let x):
of Branch as (let x): # for anonymous objects/tuples, maybe requires trailing comma
of Branch as (let x, let y):
of Branch as let (x, y):
of Branch as (fieldName: let x):
# if we don't care to make bindings explicit, we can always assume identifiers to be `let`/`var`
of Branch as BranchType(x):
of Branch as (x):
of Branch as (x, y):
of Branch as var x: # makes this possible for what it's worth
of Branch as let x: # and this

# this would be a reuse of a potential mechanism like:
BranchType(let x) = y
let BranchType(x) = y # x assumed to be let
let (x) = y # language already allows this to unpack unary tuples
(let x, let y) = z

If all of these are ugly, I don't see how Branch(let x) as sugar for Branch as BranchType(let x) is unsound.

Araq · 2024-01-15T00:19:27Z

I don't understand your idea.

konsumlamm · 2024-01-16T16:49:16Z

Personally, I would prefer something closer to Rust:

type
  Option[T] = case
    of None()
    of Some(T)

  Either[A, B] = case
    of Le(A)
    of Ri(T)

  Node = case
    of BinaryOpr:
      a, b: ref Node
    of UnaryOpr:
      a: ref Node
    of Variable(string)
    of Value(int)

At least for the tuple variants, how the definition looks would match how the destructuring looks.
Most of the time, you don't need the variants as separate objects, so having to define them feels like boilerplate.

ASVIEST · 2024-01-16T17:04:32Z

At least for the tuple variants, how the definition looks would match how the destructuring looks. Most of the time, you don't need the variants as separate objects, so having to define them feels like boilerplate.

I think that when sum types is mapping
X -> Y
Is really nice, you can store Y in different module or just object and make code mode readable and flexible, you can add logic for Y and then just use it.
When you not need Y type, just use tuple:

type
  Node = case
    of BinaryOpr: (a, b: ref Node)
    of UnaryOpr: (a: ref Node)
    of Variable(string)
    of Value(int)

doing it like a Rust removes flexibility without increasing the simplicity of the code. It is also possible that such sum types are simpler to implement in backends.

It also fits better into the logic of the case statement:

type
  Sth = case
    of A1: B1
    of A2: B2

B1 is not a just field list, it's type

Varriount · 2024-01-17T18:10:24Z

Regarding testing the underlying type of a sum-typed variable, requiring use of a kind function or field as part of the case expression's statement would be more explicit (to both the reader and the compiler), and prevent ambiguity:

proc traverse(n: ref Node) =
  case kind(n[])
  of BinaryOpr:
    ...

It is also consistent with how most sum types are currently implemented (via object variants):

type Node = object
  case kind: NodeKind
  of nkBinaryOpr:
    ...

proc traverse(n: ref Node) =
  case n[].kind
  of nkBinaryOpr:
    ...

Furthermore, it would allow retrieving the "kind" of a sum-typed variable in other contexts (such as logging/debugging).

proc traverse(n: ref Node) =
  echo repr(kind(n))

Regarding using sum types, couldn't the current rules for object variants be used, where a variable can be used as a tested type when the compiler can statically determine that it is that actual type?

proc traverse(n: ref Node) =
  case kind(n[])
  of BinaryOpr:
    echo(n.left, n.right) # BinaryOpr-specific fields
  of UnaryOpr:
    echo(n.term) # UnaryOpr-specific field
    ...

Again, this would be consistent with how object variants are currently handled. I would need to defer to the compiler-devs, but I believe this might also allow re-use of existing compiler code too.

I know that the above syntax isn't "exciting" or "new", but it is consistent, reducing the amount of surprise/complexity that is needed to use sum objects - from a user's perspective, a sum type works "just like" an object variant (which it arguably is). It also makes migrating existing sum-types-via-object-variant code very easy.

Pattern matching, I feel, would be best implemented in a different proposal. I think that tuple unpacking and templates already serve to address most of the "boilerplate" that pattern matching is usually meant to address.

(If desired, I can go into more detail regarding what I feel are the benefits of the above syntax, but I figured I would write a shorter explanation first)

metagn · 2024-10-05T20:32:44Z

Regarding testing the underlying type of a sum-typed variable, requiring use of a kind function or field as part of the case expression's statement would be more explicit (to both the reader and the compiler), and prevent ambiguity:

Didn't see anyone else bringing this up but it is a good point that passing around the kind of the type as a value should still be possible. A less ambiguous name for this could be caseof, i.e.

type Foo = case
  of A: int
  of B: string

let foo = B("abc")
let kind = caseof(foo)
echo kind # B

Then we could also have caseof(Foo) as the type of the individual kind symbols, as a compiler-generated enum type (?).

nikhilsimha · 2024-10-26T02:19:20Z

whether Branch as x vs Branch(let x) with distinct meanings is a good design?

It feels like the latter is "unpacking" the former is "assignment". I lean towards those two having different meanings. It is also nice to NOT have two syntaxes to do the same thing.

Outside of pattern matching. Branch(x) is a constructor - x is the internal value
I think it will be more natural to keep the Branch(let x) behave like a "deconstructor" and Branch as x behave like assignment.

Did anybody even notice?

I noticed it right away.

McChuck05 · 2024-11-07T18:29:40Z

It's an improvement over the current object variant mechanism. I like the
case foo of bar as baz
being distinct from
case foo of bar(var baz1, let baz2)
I feel both forms have real value.

I'm still not really sure why the discrimination isn't simply done on the inner type, without adding a new name to each type in the sum. After all, it's really the type we're interested in. Even an "anonymous" type could be be used with this. It would also make the 'as' simpler, although perhaps less useful. But maybe that's a good thing?

type
  BinaryNode = object
    a, b: ref Node

  UnaryNode = object
    a: ref Node

  Node = case
    of BinaryNode
    of UnaryNode
    of OtherNode = ref Node  # "anonymous" type
    of string
    of Name = string  # "anonymous" type
    of int

proc traverse(n: ref Node) =
  case n[] as x      #  The "as x" doesn't seem as helpful this way, but it's still nice.
  of BinaryNode:
    traverse x.a
    traverse x.b
  of UnaryNode:
    traverse x.a
  of OtherNode:
    traverse x
  of string:
    echo x
  of Name:
    echo "Hello ", x
  of int:
    counter += x

var myNode: Node[int] = 42

#  Which is better looking?  Which is easier to implement?
if myNode[int]:  
  echo "The answer is: ", myNode + 0
elif myNode[] is Name:  
  echo "Hello, " & myNode & "!"
elif myNode of string as ovaltine:
  echo "The secret decoder message of the day is: ", ovaltine

simonkrauter mentioned this issue Jan 20, 2024

Proper sum types and structural pattern matching #525

Open

jangko mentioned this issue Jan 28, 2024

Support union types status-im/nim-json-serialization#83

Open

arnetheduck mentioned this issue Mar 12, 2024

Inefficient codegen when extracting field from temporary object nim-lang/Nim#23395

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sum types, 2024 variant #548

Sum types, 2024 variant #548

Araq commented Jan 8, 2024 •

edited

Loading

metagn commented Jan 8, 2024

arnetheduck commented Jan 8, 2024

ee7 commented Jan 8, 2024

Simple pattern matching

Araq commented Jan 8, 2024 •

edited

Loading

SirOlaf commented Jan 8, 2024

ASVIEST commented Jan 8, 2024 •

edited

Loading

Araq commented Jan 8, 2024

ASVIEST commented Jan 8, 2024 •

edited

Loading

Araq commented Jan 8, 2024

ASVIEST commented Jan 8, 2024 •

edited

Loading

ASVIEST commented Jan 8, 2024

Araq commented Jan 9, 2024

Araq commented Jan 9, 2024 •

edited

Loading

Varriount commented Jan 12, 2024 •

edited

Loading

ASVIEST commented Jan 14, 2024

Araq commented Jan 14, 2024

metagn commented Jan 15, 2024 •

edited

Loading

Araq commented Jan 15, 2024

konsumlamm commented Jan 16, 2024

ASVIEST commented Jan 16, 2024 •

edited

Loading

Varriount commented Jan 17, 2024 •

edited

Loading

metagn commented Oct 5, 2024

nikhilsimha commented Oct 26, 2024 •

edited

Loading

McChuck05 commented Nov 7, 2024 •

edited

Loading

Sum types, 2024 variant #548

Sum types, 2024 variant #548

Comments

Araq commented Jan 8, 2024 • edited Loading

Sum types, 2024 variant

Access via as

Pattern matching

More complex pattern matching

Serialization

Anon object types

metagn commented Jan 8, 2024

arnetheduck commented Jan 8, 2024

ee7 commented Jan 8, 2024

Simple pattern matching

Araq commented Jan 8, 2024 • edited Loading

SirOlaf commented Jan 8, 2024

ASVIEST commented Jan 8, 2024 • edited Loading

Araq commented Jan 8, 2024

ASVIEST commented Jan 8, 2024 • edited Loading

Araq commented Jan 8, 2024

ASVIEST commented Jan 8, 2024 • edited Loading

ASVIEST commented Jan 8, 2024

Araq commented Jan 9, 2024

Araq commented Jan 9, 2024 • edited Loading

Varriount commented Jan 12, 2024 • edited Loading

Footnotes

ASVIEST commented Jan 14, 2024

Araq commented Jan 14, 2024

metagn commented Jan 15, 2024 • edited Loading

Araq commented Jan 15, 2024

konsumlamm commented Jan 16, 2024

ASVIEST commented Jan 16, 2024 • edited Loading

Varriount commented Jan 17, 2024 • edited Loading

metagn commented Oct 5, 2024

nikhilsimha commented Oct 26, 2024 • edited Loading

McChuck05 commented Nov 7, 2024 • edited Loading

Araq commented Jan 8, 2024 •

edited

Loading

Access via `as`

Araq commented Jan 8, 2024 •

edited

Loading

ASVIEST commented Jan 8, 2024 •

edited

Loading

ASVIEST commented Jan 8, 2024 •

edited

Loading

ASVIEST commented Jan 8, 2024 •

edited

Loading

Araq commented Jan 9, 2024 •

edited

Loading

Varriount commented Jan 12, 2024 •

edited

Loading

metagn commented Jan 15, 2024 •

edited

Loading

ASVIEST commented Jan 16, 2024 •

edited

Loading

Varriount commented Jan 17, 2024 •

edited

Loading

nikhilsimha commented Oct 26, 2024 •

edited

Loading

McChuck05 commented Nov 7, 2024 •

edited

Loading