Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sum types, 2024 variant #548

Open
Araq opened this issue Jan 8, 2024 · 24 comments
Open

Sum types, 2024 variant #548

Araq opened this issue Jan 8, 2024 · 24 comments

Comments

@Araq
Copy link
Member

Araq commented Jan 8, 2024

Sum types, 2024 variant

There is a new type construct, case that gives Nim sum types, comparable to ML's, Rust's, etc.

type
  Option[T] = case
    of None: discard
    of Some: T

  Either[A, B] = case
    of Le: A
    of Ri: T

  BinaryNode = object
    a, b: ref Node
  UnaryNode = object
    a: ref Node

  Node = case
    of BinaryOpr: BinaryNode
    of UnaryOpr: UnaryNode
    of Variable: string
    of Value: int

Constructing a case branch uses the branch name plus its payload in parenthesis. However, BinaryOpr(BinaryNode(a: x, b: y))
can be shortened to BinaryOpr(a: x, b: y), an analogous shortcut exists for tuples.

To access the attached values, pattern matching must be used. This enforces correct access at compile-time.

Access via as

The syntax of Branch as x can be used to unpack the sum type to x.

proc traverse(n: ref Node) =
  case n[]
  of BinaryOpr as x:
    traverse x.a
    traverse x.b

  of UnaryOpr as x:
    traverse x.a

  of Variable as name:
    echo name
  of Value as x:
    counter += x

of Branch as variable is the basic form. For of Branch: T the variable has the type T or var T depending on the mutability of the expression x in case x.

Pattern matching

A variable of type T might be inconvienent so there also pattern matching combined with unpacking:

proc traverse(n: ref Node) =
  case n[]
  of BinaryOpr(let a, let b):
    traverse a
    traverse b

  of UnaryOpr(let a):
    traverse a

  of Variable as name:
    echo name
  of Value as x:
    counter += x

These new syntaxes of Branch as x and of Branch(let x) can later be naturally extended to if statements: if n of BinaryOpr as x or if n of Some(var n).

More complex pattern matching

Proposed syntax:

case n
of BinaryOpr(var a, UnaryOpr(let b)) if a == b:
  a = ... # can write-through
of BinaryOpr(_, let b):
  # _ can be used to ignore a value

Serialization

There are two new macros that can traverse sum types:

  1. constructCase takes in a type T and an expression in order to construct a case type T.
  2. unpackCase takes in a value of a case type and an expression in order to traverse the data structure.

For example:

type
  BinaryNode = object
    a, b: ref Node
  UnaryNode = object
    a: ref Node

  Node = case
    of BinaryOpr: BinaryNode
    of UnaryOpr: UnaryNode
    of Variable: string
    of Value: int

proc store(f: var File; x: int) = f.write x # atom
proc store(f: var File; r: ref Node) = store r[] # deref `ref`
proc store[T: object](f: var File; x: T) =
  # known Nim pattern:
  for y in fields(x): store y

proc store[T: case](f: var File; x: T) =
  unpack x, f.store, f.store
  # `unpack` is expanded into a case statement:
  # `case x[]
  # of Val as val: f.store(kind); f.store(val)`
  # ...


proc load[T: case](f: var File; x: typedesc[T]): T =
  let tmp = f.load[:IntegralType(T)]()
  result = constructCase(T, tmp, f.load)

  # constructCase is expanded into a case statement:
  # `case tmp
  # of Val: Val(f.load[:BranchType]())`
  # ...

Anon object types

Later we can add more sugar so that the definition can be simplified to:

type
  Node = case
    of BinaryOpr: object
      a, b: ref Node
    of UnaryOpr: object
      a: ref Node
    of Variable: string
    of Value: int

This way the simplicity is kept that every branch is tied to exactly one type which makes iteration over case types in a generic context much easier.

@metagn
Copy link
Contributor

metagn commented Jan 8, 2024

Some notes with regards to the implementation (feel free to ignore if not interested):

This design is about as simple as it gets for the parser (parse case differently in type contexts), type graph (other proposals expected of Some(T), of Some: T, of Some: field: T to all work), and the initial deconstruction frontend but as mentioned in #527 (comment) there is some work to do for the construction frontend.

None, BinaryOpr etc. would probably best be a new symbol kind skCase that can also undergo generic overloading for the compiler to deal with it in call syntax. We could implement this from scratch but sigmatch has the logic for it for routines, meaning we should probably do the refactor that generalizes sigmatch to include stuff like type conversions. Also sigmatch could take the expected return type into account, i.e.:

type
  Option[T] = case
    of None: discard
    of Some: T
  List[T] = case
    of None: discard
    of Cons: (T, ref List[T])

let x: Option[int] = None()

should work.

@arnetheduck
Copy link

The construction syntax I'm interested in to get comparable ergonomics of use with other languages (for nim-result):

let x: Either[int, string] = Le(42)

ie Le(42) does not need to know about string.

@ee7
Copy link

ee7 commented Jan 8, 2024

Simple pattern matching

The syntax of Branch as x can be used to unpack the sum type to x.

proc traverse(n: ref Node) =
  case n[]
  of BinaryOpr as x:
    traverse x.a
    traverse x.b
  # ...

of Branch as variable is sugar for of Branch(let variable). of Branch(var variable) is also available allowing mutations to variable to write through to the underlying enum object.

Given that of Branch as x is available, any thoughts on of Branch as var x as sugar for the other simple form?

@Araq
Copy link
Member Author

Araq commented Jan 8, 2024

Given that of Branch as x is available, any thoughts on of Branch as var x as sugar for the other simple form?

Maybe even Branch as x should not be done and only Branch(let x) should be available. We don't have a good history of providing more than one syntax and letting people choose, it always ends up in mild fights in the style guides.

Not applicable anymore as there is an inherent difference between as x and var x, the x has a different type.

@SirOlaf
Copy link

SirOlaf commented Jan 8, 2024

The construction syntax I'm interested in to get comparable ergonomics of use with other languages (for nim-result):

let x: Either[int, string] = Le(42)

ie Le(42) does not need to know about string.

The devel compiler has {.experimental: "inferGenericTypes".} which lets you do that for functions already, hopefully a similar mechanism can be applied here.

@ASVIEST
Copy link

ASVIEST commented Jan 8, 2024

I think this will be better for pattern matching

case n
of BinaryOpr(a: var le, b: UnaryOpr(a: let ri)) if le == ri:
  le = ... # can write-through

It's more like an object constructor
and it allows to skip sth like:

case n
of BinaryOpr(b: UnaryOpr(a: let ri)):
  echo ri # I dont care about left operand in BinaryOpr

@Araq
Copy link
Member Author

Araq commented Jan 8, 2024

@ASVIEST "don't cares" should be written as _. I think the syntax without field: is preferable as it's shorter and still allows for everything.

@ASVIEST
Copy link

ASVIEST commented Jan 8, 2024

@ASVIEST "don't cares" should be written as _. I think the syntax without field: is preferable as it's shorter and still allows for everything.

this may be inconvenient for objects with a fairly large number of fields. Maybe we can allow both of these syntaxes like #517 (or maybe better #418) ?

type
  SampleChunk = object
    chunkID: uint32
    size: uint32
    manufacturer: uint32
    product: uint32
    samplePeriod: uint32
    unityNote: uint32
    pitchFraction: uint32
    smpteFmt: uint32
    smpteOffset: uint32
    loopCnt: uint32
    dataSize: uint32
    sampleLoops: seq[SampleLoop]
    data: seq[byte]
  
  SampleLoop = object
    id: uint32
    typ: uint32
    start: uint32
    endBlock: uint32
    frac: uint32
    playbacks: uint32

  WaveChunk = case
    of Sample: SampleChunk
    ...

var x = Sample()
case x
of Sample(_, _, _, _, _, _, _, _, _, _, let dataSize, _, _):
  echo dataSize
else: discard
# vs
case x
of Sample(dataSize: let dataSize):
  echo dataSize
else: discard

allowing both syntaxes it will be convenient both for objects with a large number of fields and for objects with a small number
BTW, apparently I’m not the only one who thought that fieldX: let x was also needed: #537 (comment)

@Araq
Copy link
Member Author

Araq commented Jan 8, 2024

It's just:

case x
of Sample(let x):
  echo x.dataSize
else: discard

To select a single field there is no reason to unpack stuff. Keep things simple.

@ASVIEST
Copy link

ASVIEST commented Jan 8, 2024

It's just:

case x
of Sample(let x):
  echo x.dataSize
else: discard

To select a single field there is no reason to unpack stuff. Keep things simple.

It means that in

case n
of BinaryOpr(var a, UnaryOpr(let b)) if a == b:
  a = ... # can write-through

let b is UnaryOpr and not ref Node ?
var a is ref Node
what ?

@ASVIEST
Copy link

ASVIEST commented Jan 8, 2024

Maybe you mean this ?

case x
of Sample():
  echo x.dataSize
else: discard

@Araq
Copy link
Member Author

Araq commented Jan 9, 2024

Sorry, I made a mistake. Please re-read the proposal. :-/

@Araq
Copy link
Member Author

Araq commented Jan 9, 2024

I mean this:

case x
of Sample as y:
  echo y.dataSize
else: discard

@Varriount
Copy link

Varriount commented Jan 12, 2024

Positional pattern matching should only be allowed for tuple and enumeration types, as these types already have prominent features that rely on the ordering of their fields. In contrast, object types do not1. Allowing positional pattern matching for object types would result in their public field ordering forming part of their API; developers would not be able to move fields around, nor insert new fields before existing fields, nor remove existing fields not at the end of the object, without introducing compatibility-breaking changes.

Considering that only a small subset of objects have a conceptually "natural" field ordering, introducing such behavior doesn't make sense. For objects that do have a conceptually significant field ordering, a tuple type is more appropriate.

Footnotes

  1. Though they do have minor features that rely on field ordering.

@ASVIEST
Copy link

ASVIEST commented Jan 14, 2024

Positional pattern matching should only be allowed for tuple and enumeration types, as these types already have prominent features that rely on the ordering of their fields. In contrast, object types do not

In general, I agree, but I think that instead of the complete absence of a positional pattern matching for objects, it should be left for objects with a new pragma {.positional.}

@Araq
Copy link
Member Author

Araq commented Jan 14, 2024

Can we focus on the question of whether Branch as x vs Branch(let x) with distinct meanings is a good design? Can it be avoided? Did anybody even notice?

@metagn
Copy link
Contributor

metagn commented Jan 15, 2024

They could be merged into (assuming Branch(let x) means "unpack the first field of the branch type to x" and not "unpack the full branch to x"):

of Branch as BranchType(let x):
of Branch as (let x): # for anonymous objects/tuples, maybe requires trailing comma
of Branch as (let x, let y):
of Branch as let (x, y):
of Branch as (fieldName: let x):
# if we don't care to make bindings explicit, we can always assume identifiers to be `let`/`var`
of Branch as BranchType(x):
of Branch as (x):
of Branch as (x, y):
of Branch as var x: # makes this possible for what it's worth
of Branch as let x: # and this

# this would be a reuse of a potential mechanism like:
BranchType(let x) = y
let BranchType(x) = y # x assumed to be let
let (x) = y # language already allows this to unpack unary tuples
(let x, let y) = z

If all of these are ugly, I don't see how Branch(let x) as sugar for Branch as BranchType(let x) is unsound.

@Araq
Copy link
Member Author

Araq commented Jan 15, 2024

I don't understand your idea.

@konsumlamm
Copy link

Personally, I would prefer something closer to Rust:

type
  Option[T] = case
    of None()
    of Some(T)

  Either[A, B] = case
    of Le(A)
    of Ri(T)

  Node = case
    of BinaryOpr:
      a, b: ref Node
    of UnaryOpr:
      a: ref Node
    of Variable(string)
    of Value(int)

At least for the tuple variants, how the definition looks would match how the destructuring looks.
Most of the time, you don't need the variants as separate objects, so having to define them feels like boilerplate.

@ASVIEST
Copy link

ASVIEST commented Jan 16, 2024

At least for the tuple variants, how the definition looks would match how the destructuring looks. Most of the time, you don't need the variants as separate objects, so having to define them feels like boilerplate.

I think that when sum types is mapping
X -> Y
Is really nice, you can store Y in different module or just object and make code mode readable and flexible, you can add logic for Y and then just use it.
When you not need Y type, just use tuple:

type
  Node = case
    of BinaryOpr: (a, b: ref Node)
    of UnaryOpr: (a: ref Node)
    of Variable(string)
    of Value(int)

doing it like a Rust removes flexibility without increasing the simplicity of the code. It is also possible that such sum types are simpler to implement in backends.

It also fits better into the logic of the case statement:

type
  Sth = case
    of A1: B1
    of A2: B2

B1 is not a just field list, it's type

@Varriount
Copy link

Varriount commented Jan 17, 2024

Regarding testing the underlying type of a sum-typed variable, requiring use of a kind function or field as part of the case expression's statement would be more explicit (to both the reader and the compiler), and prevent ambiguity:

proc traverse(n: ref Node) =
  case kind(n[])
  of BinaryOpr:
    ...

It is also consistent with how most sum types are currently implemented (via object variants):

type Node = object
  case kind: NodeKind
  of nkBinaryOpr:
    ...

proc traverse(n: ref Node) =
  case n[].kind
  of nkBinaryOpr:
    ...

Furthermore, it would allow retrieving the "kind" of a sum-typed variable in other contexts (such as logging/debugging).

proc traverse(n: ref Node) =
  echo repr(kind(n))



Regarding using sum types, couldn't the current rules for object variants be used, where a variable can be used as a tested type when the compiler can statically determine that it is that actual type?

proc traverse(n: ref Node) =
  case kind(n[])
  of BinaryOpr:
    echo(n.left, n.right) # BinaryOpr-specific fields
  of UnaryOpr:
    echo(n.term) # UnaryOpr-specific field
    ...

Again, this would be consistent with how object variants are currently handled. I would need to defer to the compiler-devs, but I believe this might also allow re-use of existing compiler code too.

I know that the above syntax isn't "exciting" or "new", but it is consistent, reducing the amount of surprise/complexity that is needed to use sum objects - from a user's perspective, a sum type works "just like" an object variant (which it arguably is). It also makes migrating existing sum-types-via-object-variant code very easy.

Pattern matching, I feel, would be best implemented in a different proposal. I think that tuple unpacking and templates already serve to address most of the "boilerplate" that pattern matching is usually meant to address.

(If desired, I can go into more detail regarding what I feel are the benefits of the above syntax, but I figured I would write a shorter explanation first)

@metagn
Copy link
Contributor

metagn commented Oct 5, 2024

Regarding testing the underlying type of a sum-typed variable, requiring use of a kind function or field as part of the case expression's statement would be more explicit (to both the reader and the compiler), and prevent ambiguity:

Didn't see anyone else bringing this up but it is a good point that passing around the kind of the type as a value should still be possible. A less ambiguous name for this could be caseof, i.e.

type Foo = case
  of A: int
  of B: string

let foo = B("abc")
let kind = caseof(foo)
echo kind # B

Then we could also have caseof(Foo) as the type of the individual kind symbols, as a compiler-generated enum type (?).

@nikhilsimha
Copy link

nikhilsimha commented Oct 26, 2024

whether Branch as x vs Branch(let x) with distinct meanings is a good design?

It feels like the latter is "unpacking" the former is "assignment". I lean towards those two having different meanings. It is also nice to NOT have two syntaxes to do the same thing.

Outside of pattern matching. Branch(x) is a constructor - x is the internal value
I think it will be more natural to keep the Branch(let x) behave like a "deconstructor" and Branch as x behave like assignment.

Did anybody even notice?

I noticed it right away.

@McChuck05
Copy link

McChuck05 commented Nov 7, 2024

It's an improvement over the current object variant mechanism. I like the
case foo of bar as baz
being distinct from
case foo of bar(var baz1, let baz2)
I feel both forms have real value.

I'm still not really sure why the discrimination isn't simply done on the inner type, without adding a new name to each type in the sum. After all, it's really the type we're interested in. Even an "anonymous" type could be be used with this. It would also make the 'as' simpler, although perhaps less useful. But maybe that's a good thing?

type
  BinaryNode = object
    a, b: ref Node

  UnaryNode = object
    a: ref Node

  Node = case
    of BinaryNode
    of UnaryNode
    of OtherNode = ref Node  # "anonymous" type
    of string
    of Name = string  # "anonymous" type
    of int

proc traverse(n: ref Node) =
  case n[] as x      #  The "as x" doesn't seem as helpful this way, but it's still nice.
  of BinaryNode:
    traverse x.a
    traverse x.b
  of UnaryNode:
    traverse x.a
  of OtherNode:
    traverse x
  of string:
    echo x
  of Name:
    echo "Hello ", x
  of int:
    counter += x

var myNode: Node[int] = 42

#  Which is better looking?  Which is easier to implement?
if myNode[int]:  
  echo "The answer is: ", myNode + 0
elif myNode[] is Name:  
  echo "Hello, " & myNode & "!"
elif myNode of string as ovaltine:
  echo "The secret decoder message of the day is: ", ovaltine

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants