Skip to content

Commit 7a9dd1f

Browse files
committed
WIP: sketch out wasi-nn extensions
This change alters the `wasi-nn` world to split out two different modes of operation: - `inference`: this continues the traditional mechanism for computing with wasi-nn, by passing named `tensor`s to a `context`. Now that `tensor`s are resources, we pass all inputs and return all outputs together, eliminating `get-input` and `set-output` - `prompt`: this new mode expects a `string` prompt which is passed along to a backend LLM. The returned string is not streamed, but could be in the future This change also adds metadata modification of the `graph` via `list-properties`, `get-property` and `set-property`. It is unclear whether these methods should hang off the `context` objects instead (TODO). It is also unclear whether the model of `load`-ing a `graph` and then initializing it into one of the two modes via `inference::init` or `prompt::init` is the best approach; most graphs are one or the other so it does not make sense to open the door to `init` failures. [bytecodealliance#74] (replace `load` with `load-by-name`) is replicated in this commit. [bytecodealliance#75] (return errors as records) and [bytecodealliance#76] (remove the error constructor) is superseded by this commit, since every error is simply returned as a `string` and the `error` resource is removed. [bytecodealliance#74]: WebAssembly/wasi-nn#74 [bytecodealliance#75]: WebAssembly/wasi-nn#75 [bytecodealliance#76]: WebAssembly/wasi-nn#76
1 parent 6907868 commit 7a9dd1f

File tree

1 file changed

+51
-89
lines changed

1 file changed

+51
-89
lines changed

crates/wasi-nn/wit/wasi-nn.wit

Lines changed: 51 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ world ml {
1212
import tensor;
1313
import graph;
1414
import inference;
15-
import errors;
15+
import prompt;
1616
}
1717

1818
/// All inputs and outputs to an ML inference are represented as `tensor`s.
@@ -61,108 +61,70 @@ interface tensor {
6161
/// A `graph` is a loaded instance of a specific ML model (e.g., MobileNet) for a specific ML
6262
/// framework (e.g., TensorFlow):
6363
interface graph {
64-
use errors.{error};
65-
use tensor.{tensor};
66-
use inference.{graph-execution-context};
67-
68-
/// An execution graph for performing inference (i.e., a model).
69-
resource graph {
70-
init-execution-context: func() -> result<graph-execution-context, error>;
71-
}
72-
73-
/// Describes the encoding of the graph. This allows the API to be implemented by various
74-
/// backends that encode (i.e., serialize) their graph IR with different formats.
75-
enum graph-encoding {
76-
openvino,
77-
onnx,
78-
tensorflow,
79-
pytorch,
80-
tensorflowlite,
81-
ggml,
82-
autodetect,
83-
}
84-
85-
/// Define where the graph should be executed.
86-
enum execution-target {
87-
cpu,
88-
gpu,
89-
tpu
90-
}
91-
92-
/// The graph initialization data.
93-
///
94-
/// This gets bundled up into an array of buffers because implementing backends may encode their
95-
/// graph IR in parts (e.g., OpenVINO stores its IR and weights separately).
96-
type graph-builder = list<u8>;
97-
98-
/// Load a `graph` from an opaque sequence of bytes to use for inference.
99-
load: func(builder: list<graph-builder>, encoding: graph-encoding, target: execution-target) -> result<graph, error>;
100-
10164
/// Load a `graph` by name.
10265
///
10366
/// How the host expects the names to be passed and how it stores the graphs for retrieval via
10467
/// this function is **implementation-specific**. This allows hosts to choose name schemes that
10568
/// range from simple to complex (e.g., URLs?) and caching mechanisms of various kinds.
106-
load-by-name: func(name: string) -> result<graph, error>;
107-
}
69+
load: func(name: string) -> result<graph, string>;
10870

109-
/// An inference "session" is encapsulated by a `graph-execution-context`. This structure binds a
110-
/// `graph` to input tensors before `compute`-ing an inference:
111-
interface inference {
112-
use errors.{error};
113-
use tensor.{tensor, tensor-data};
114-
115-
/// Bind a `graph` to the input and output tensors for an inference.
116-
///
117-
/// TODO: this may no longer be necessary in WIT
118-
/// (https://github.com/WebAssembly/wasi-nn/issues/43)
119-
resource graph-execution-context {
120-
/// Define the inputs to use for inference.
121-
set-input: func(name: string, tensor: tensor) -> result<_, error>;
71+
/// An execution graph for performing inference (i.e., a model).
72+
resource graph {
73+
/// Retrieve the properties of the graph.
74+
///
75+
/// These are metadata about the graph, unique to the graph and the
76+
/// ML backend providing it.
77+
list-properties: func() -> list<string>;
12278

123-
/// Compute the inference on the given inputs.
79+
/// Retrieve the value of a property.
12480
///
125-
/// Note the expected sequence of calls: `set-input`, `compute`, `get-output`. TODO: this
126-
/// expectation could be removed as a part of
127-
/// https://github.com/WebAssembly/wasi-nn/issues/43.
128-
compute: func() -> result<_, error>;
81+
/// If the property does not exist, this function returns `none`.
82+
get-property: func(name: string) -> option<string>;
12983

130-
/// Extract the outputs after inference.
131-
get-output: func(name: string) -> result<tensor, error>;
84+
/// Modify the value of a property.
85+
///
86+
/// If the operation fails, this function returns a string from the ML
87+
/// backend describing the error.
88+
set-property: func(name: string, value: string) -> result<_, string>;
13289
}
13390
}
13491

135-
/// TODO: create function-specific errors (https://github.com/WebAssembly/wasi-nn/issues/42)
136-
interface errors {
137-
enum error-code {
138-
// Caller module passed an invalid argument.
139-
invalid-argument,
140-
// Invalid encoding.
141-
invalid-encoding,
142-
// The operation timed out.
143-
timeout,
144-
// Runtime Error.
145-
runtime-error,
146-
// Unsupported operation.
147-
unsupported-operation,
148-
// Graph is too large.
149-
too-large,
150-
// Graph not found.
151-
not-found,
152-
// The operation is insecure or has insufficient privilege to be performed.
153-
// e.g., cannot access a hardware feature requested
154-
security,
155-
// The operation failed for an unspecified reason.
156-
unknown
157-
}
92+
/// An inference "session" is encapsulated by a `context`; use this to `compute`
93+
/// an inference.
94+
interface inference {
95+
use graph.{graph};
96+
use tensor.{tensor};
15897

159-
resource error {
160-
constructor(code: error-code, data: string);
98+
/// Initialize an inference session with a graph.
99+
///
100+
/// Note that not all graphs are inference-ready (see `prompt`); this
101+
/// function may fail in this case.
102+
init: func(graph: graph) -> result<context, string>;
103+
104+
/// Identify a tensor by name; this is necessary to associate tensors to
105+
/// graph inputs and outputs.
106+
type named-tensor = tuple<string, tensor>;
107+
108+
/// An inference "session."
109+
resource context {
110+
/// Compute an inference request with the given inputs.
111+
compute: func(inputs: list<named-tensor>) -> result<list<named-tensor>, string>;
112+
}
113+
}
161114

162-
/// Return the error code.
163-
code: func() -> error-code;
115+
/// A prompt "session" is encapsulated by a `context`.
116+
interface prompt {
117+
use graph.{graph};
164118

165-
/// Errors can propagated with backend specific status through a string value.
166-
data: func() -> string;
119+
/// Initialize a prompt session with a graph.
120+
///
121+
/// Note that not all graphs are prompt-ready (see `inference`); this
122+
/// function may fail in this case.
123+
init: func(graph: graph) -> result<context, string>;
124+
125+
/// A prompt "session."
126+
resource context {
127+
/// Compute an inference request with the given inputs.
128+
compute: func(prompt: string) -> result<string, string>;
167129
}
168130
}

0 commit comments

Comments
 (0)