[wasm-gc] Optimizing trait polymorphism memory usage by using static tables

```moonbit
trait SomeTrait {
  callback(Self) -> Unit
}

struct Value {}

impl SomeTrait for Value with callback(self) -> Unit {
  println("Value::callback")
}

fn example(value: &SomeTrait) -> Unit {
  value.callback()
}
```
```wasm
(type $SomeTrait
  (sub
   (struct
    (field  (ref $SomeTrait.method_0)))))

(type $Value.as_SomeTrait
 (sub
  $SomeTrait
  (struct
   (field  (ref $SomeTrait.method_0))
   (field  (ref $Value)))))
```
Today when using Trait polymorphism (like defining a parameter as `&Value` above) the compiler generates a new struct that's effectively a run-time vtable every time a struct that implements that trait is passed into such a function, like this:

```wasm
(call $jayphelps/example/main.example
   (struct.new $Value.as_SomeTrait
    (ref.func $@jayphelps/example/main.SomeTrait::@jayphelps/example/main.Value::callback.dyncall_as_SomeTrait)
    (local.get $value/123)))))
(; call it again or any other function which relies on trait polymorphism ;)
(call $jayphelps/example/main.example
   (struct.new $Value.as_SomeTrait
    (ref.func $@jayphelps/example/main.SomeTrait::@jayphelps/example/main.Value::callback.dyncall_as_SomeTrait)
    (local.get $value/123)))))
```

In this example, if you pass the struct value to multiple functions a new intermediate vtable struct is created each time. It's somewhat contrived given there's only one method, but hopefully it demonstrates the point cause "at scale" in complex apps with lots of methods this could add up.

Off hand I don't think there's a way around having this separate struct since Wasm GC doesn't (yet) support subtyping multiple structs, but I'm curious if this can be memory optimized by placing all the functions in a statically defined Wasm `(table)`, avoiding the overhead both of the extra allocations to store the functions? That table's index is then what you put in $Value.as_SomeTrait.

That said, this might come at the cost of CPU performance as when you need to use `call_indirect`, which incurs bounds/type checks and lookup. But I'm not sure if `call_ref` is optimized beyond `call_indirect` yet in most VMs or not. One might argue memory is cheap these days, and to error on the side of better CPU perf. I guess you could also define the vtable still using a struct with `ref.func`s but as a global, which would sort of be a middle ground perhaps. Faster calls, less redundant memory, but still some small overhead with the global lookup.

So I guess that leaves me more just curious what your thinking is around this. Mostly out of professional curiosity—you all have more compiler experience than I do, so I'd appreciate learning if this was deliberate, and why. I noticed this because my code is heavily relying on this feature, so I quickly saw a lot of redundant allocations when examining the build.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wasm-gc] Optimizing trait polymorphism memory usage by using static tables #697

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[wasm-gc] Optimizing trait polymorphism memory usage by using static tables #697

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions