Skip to content

[wasm-gc] Optimizing trait polymorphism memory usage by using static tables #697

Open
@jayphelps

Description

@jayphelps
trait SomeTrait {
  callback(Self) -> Unit
}

struct Value {}

impl SomeTrait for Value with callback(self) -> Unit {
  println("Value::callback")
}

fn example(value: &SomeTrait) -> Unit {
  value.callback()
}
(type $SomeTrait
  (sub
   (struct
    (field  (ref $SomeTrait.method_0)))))

(type $Value.as_SomeTrait
 (sub
  $SomeTrait
  (struct
   (field  (ref $SomeTrait.method_0))
   (field  (ref $Value)))))

Today when using Trait polymorphism (like defining a parameter as &Value above) the compiler generates a new struct that's effectively a run-time vtable every time a struct that implements that trait is passed into such a function, like this:

(call $jayphelps/example/main.example
   (struct.new $Value.as_SomeTrait
    (ref.func $@jayphelps/example/main.SomeTrait::@jayphelps/example/main.Value::callback.dyncall_as_SomeTrait)
    (local.get $value/123)))))
(; call it again or any other function which relies on trait polymorphism ;)
(call $jayphelps/example/main.example
   (struct.new $Value.as_SomeTrait
    (ref.func $@jayphelps/example/main.SomeTrait::@jayphelps/example/main.Value::callback.dyncall_as_SomeTrait)
    (local.get $value/123)))))

In this example, if you pass the struct value to multiple functions a new intermediate vtable struct is created each time. It's somewhat contrived given there's only one method, but hopefully it demonstrates the point cause "at scale" in complex apps with lots of methods this could add up.

Off hand I don't think there's a way around having this separate struct since Wasm GC doesn't (yet) support subtyping multiple structs, but I'm curious if this can be memory optimized by placing all the functions in a statically defined Wasm (table), avoiding the overhead both of the extra allocations to store the functions? That table's index is then what you put in $Value.as_SomeTrait.

That said, this might come at the cost of CPU performance as when you need to use call_indirect, which incurs bounds/type checks and lookup. But I'm not sure if call_ref is optimized beyond call_indirect yet in most VMs or not. One might argue memory is cheap these days, and to error on the side of better CPU perf. I guess you could also define the vtable still using a struct with ref.funcs but as a global, which would sort of be a middle ground perhaps. Faster calls, less redundant memory, but still some small overhead with the global lookup.

So I guess that leaves me more just curious what your thinking is around this. Mostly out of professional curiosity—you all have more compiler experience than I do, so I'd appreciate learning if this was deliberate, and why. I noticed this because my code is heavily relying on this feature, so I quickly saw a lot of redundant allocations when examining the build.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions