You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need to clean up the handling of fixed-shape tensors, and creating a fixed-size Type (TensorType, really) is perhaps the best way to do that.
Such a TensorType would allow us to replace/remove and generalize most—if not all—of the distinct Shape handling logic and the haphazard use of aesara.tensor.get_vector_length, among other things.
For anyone not intimately familiar with these parts of Aesara's internals, users deal with Variable objects (e.g. aesara.tensor.vector() returns a TensorVariable instance, which is a subclass of Variable), and each Variable has a Variable.type value that extends Type. Typeinstances are exactly what you'd guess they are: objects that emulate Python types and provide static domain-specific type information. For example, TensorTypes hold the data type (i.e. TensorType.dtype), and the number of dimensions/which dimensions are broadcastable (i.e. TensorType.broadcastable).
See #134 for more information about Aesara's types and how we might be able to simplify them by using actual Python types instead.
Background
If we were to implement a fixed-shape TensorType, and corresponding TensorVariable, the first question would be: "How do we change the results of every Op.make_node so that they produce outputs that are fixed-shape TensorVariables when their inputs are fixed-shape?". This is the real challenge underlying the idea.
Without addressing this question, we could have fixed-shape TensorTypes/Variables, but they would only be useful to the first Ops they encounter. That's not necessarily a bad thing, because there are definitely a few Ops that could make good use of this (e.g. size and shape-based Op parameters). Even so, we can definitely do better.
The ShapeOp, ShapeFeature, ShapeOptimizer, and their associated Op.infer_shape methods are Aesara's current means of doing all this, and we might only need to cleverly repurpose the latter to get what we want.
Proposed Changes
More specifically, Op.infer_shape is the means through which an Op propagates the "fixed-shape-edness" of its inputs through to its outputs. We could use these existing—albeit voluntary—implementations to determine whether or not an Op's output should be a fixed-shape TensorVariable or not, and, if so, propagate the sizes and construct one.
In general, Aesara is designed in a way that makes Python-based typing very difficult to use constructively, because each individual Op is left to determine the exact types of the outputs it generates. In nearly every instance, an Op will simply construct TensorVariables from scratch and use those as its outputs, making it all but impossible to utilize TensorVariable subclasses—like our proposed fixed-shape TensorVariable—without changing all existing Op.make_node implementations.
Instead, we could actually make Apply something functional and utilize it for these purposes (and perhaps more). At present, Apply is essentially just a container that performs no significant actions. For instance, when an Op.make_node returns the Apply nodes it constructs, almost nothing is done by Apply other than trivial input validation. We could make Apply a point of dispatch for iterative, type-based computations; that way, each Op isn't responsible for reasoning about things that are arguably outside of its scope.
In other words, when an Op is called with some inputs, Op.make_node constructs its outputs and creates an Apply node with said inputs and outputs, then the Apply node constructor (or Apply.__new__) calls the repurposed Op.infer_shape, which computes the concrete shapes of the outputs given the inputs, and creates updated fixed-shape outputs.
Issues
Computation
Such a change would move the somewhat elective shape inference process, which currently happens during optimization, into the model construction process. Since shape inference isn't a particularly computationally intensive thing, this isn't a real concern. Plus, we can always make shape propagation configurable, so there's no real need for a complete trade-off.
Alternatively, we might be able to make shape propagation a lazily computed process. This could be facilitated by changes to TensorVariable itself. For instance, instead of TensorVariable.shape returning the output of a ShapeOp, it could return a fixed-length Sequence that, when accessed would lazily compute the shape values.
This could be a huge simplification relative to the current approach, because shape entries would be distinct objects. Under the current approach, graphs can contain numerous distinct ShapeOps, and *Subtensor*Ops on those Shapes, that all refer to the same shapes. Such a change would reduce the amount of merge-like work needed during the shape optimization process. It might also remove the need for some/all of the convoluted *Subtensor* considerations in ShapeFeature.
Regardless, these considerations are somewhat tangential to the present issue.
Unusual Ops
One issue with this approach is that an Op may for some reason want to hold on to the outputs it generates, and, if such an Op assumes that said outputs will be used down the line, it would be mistaken under these changes. This is almost exclusively an Op-specific implementation issue; one that can—however—be easily remedied by first creating the Apply node and then using its refined outputs.
In-place Apply updates
FunctionGraph.replace still operates by changing values in Apply.inputs. This would circumvent the proposed changes.
Under certain assumptions, this might not be a problem, though. For instance, if fixed-shape TensorVariables have already been produced, and their shapes have already been computed, then moving them around shouldn't matter, because, by their very type alone, no shape computations are needed (i.e. they carry their static shape information with them at all times).
Otherwise, if there's a shape conflict that arises from a rewrite, it's really the fault of the rewrite operation.
The real concern arises when a fixed-shape TensorVariable replaces a non-fixed-shape TensorVariable input and vice versa, because we might need to update something downstream. The type information itself would imply that the such a substitution isn't valid, at least not without assuming that the non-fixed-shape variables actually have the same fixed-shape (e.g. the fixed-shape types are refinement types and we're talking about something like substitutability).
There might not be many/any scenarios in which this is a real concern, especially when/if all of the fixed-shape information has already been determined within the graph containing the in-place updated Apply node. Regardless, this is perhaps the main concern right now.
The text was updated successfully, but these errors were encountered:
We need to clean up the handling of fixed-shape tensors, and creating a fixed-size
Type
(TensorType
, really) is perhaps the best way to do that.Such a
TensorType
would allow us to replace/remove and generalize most—if not all—of the distinctShape
handling logic and the haphazard use ofaesara.tensor.get_vector_length
, among other things.Likewise, it provides a good solution to #42/#93.
For anyone not intimately familiar with these parts of Aesara's internals, users deal with
Variable
objects (e.g.aesara.tensor.vector()
returns aTensorVariable
instance, which is a subclass ofVariable
), and eachVariable
has aVariable.type
value that extendsType
.Type
instances are exactly what you'd guess they are: objects that emulate Pythontype
s and provide static domain-specific type information. For example,TensorTypes
hold the data type (i.e.TensorType.dtype
), and the number of dimensions/which dimensions are broadcastable (i.e.TensorType.broadcastable
).See #134 for more information about Aesara's types and how we might be able to simplify them by using actual Python
type
s instead.Background
If we were to implement a fixed-shape
TensorType
, and correspondingTensorVariable
, the first question would be:"How do we change the results of every
Op.make_node
so that they produce outputs that are fixed-shapeTensorVariable
s when their inputs are fixed-shape?". This is the real challenge underlying the idea.Without addressing this question, we could have fixed-shape
TensorType
s/Variable
s, but they would only be useful to the firstOp
s they encounter. That's not necessarily a bad thing, because there are definitely a fewOp
s that could make good use of this (e.g.size
andshape
-basedOp
parameters). Even so, we can definitely do better.The
Shape
Op
,ShapeFeature
,ShapeOptimizer
, and their associatedOp.infer_shape
methods are Aesara's current means of doing all this, and we might only need to cleverly repurpose the latter to get what we want.Proposed Changes
More specifically,
Op.infer_shape
is the means through which anOp
propagates the "fixed-shape-edness" of its inputs through to its outputs. We could use these existing—albeit voluntary—implementations to determine whether or not anOp
's output should be a fixed-shapeTensorVariable
or not, and, if so, propagate the sizes and construct one.In general, Aesara is designed in a way that makes Python-based typing very difficult to use constructively, because each individual
Op
is left to determine the exact types of the outputs it generates. In nearly every instance, anOp
will simply constructTensorVariable
s from scratch and use those as its outputs, making it all but impossible to utilizeTensorVariable
subclasses—like our proposed fixed-shapeTensorVariable
—without changing all existingOp.make_node
implementations.Instead, we could actually make
Apply
something functional and utilize it for these purposes (and perhaps more). At present,Apply
is essentially just a container that performs no significant actions. For instance, when anOp.make_node
returns theApply
nodes it constructs, almost nothing is done byApply
other than trivial input validation. We could makeApply
a point of dispatch for iterative, type-based computations; that way, eachOp
isn't responsible for reasoning about things that are arguably outside of its scope.In other words, when an
Op
is called with some inputs,Op.make_node
constructs its outputs and creates anApply
node with said inputs and outputs, then theApply
node constructor (orApply.__new__
) calls the repurposedOp.infer_shape
, which computes the concrete shapes of the outputs given the inputs, and creates updated fixed-shape outputs.Issues
Computation
Such a change would move the somewhat elective shape inference process, which currently happens during optimization, into the model construction process. Since shape inference isn't a particularly computationally intensive thing, this isn't a real concern. Plus, we can always make shape propagation configurable, so there's no real need for a complete trade-off.
Alternatively, we might be able to make shape propagation a lazily computed process. This could be facilitated by changes to
TensorVariable
itself. For instance, instead ofTensorVariable.shape
returning the output of aShape
Op
, it could return a fixed-lengthSequence
that, when accessed would lazily compute the shape values.This could be a huge simplification relative to the current approach, because shape entries would be distinct objects. Under the current approach, graphs can contain numerous distinct
Shape
Op
s, and*Subtensor*
Op
s on thoseShape
s, that all refer to the same shapes. Such a change would reduce the amount of merge-like work needed during the shape optimization process. It might also remove the need for some/all of the convoluted*Subtensor*
considerations inShapeFeature
.Regardless, these considerations are somewhat tangential to the present issue.
Unusual
Op
sOne issue with this approach is that an
Op
may for some reason want to hold on to the outputs it generates, and, if such anOp
assumes that said outputs will be used down the line, it would be mistaken under these changes. This is almost exclusively anOp
-specific implementation issue; one that can—however—be easily remedied by first creating theApply
node and then using its refined outputs.In-place
Apply
updatesFunctionGraph.replace
still operates by changing values inApply.inputs
. This would circumvent the proposed changes.Under certain assumptions, this might not be a problem, though. For instance, if fixed-shape
TensorVariable
s have already been produced, and their shapes have already been computed, then moving them around shouldn't matter, because, by their very type alone, no shape computations are needed (i.e. they carry their static shape information with them at all times).Otherwise, if there's a shape conflict that arises from a rewrite, it's really the fault of the rewrite operation.
The real concern arises when a fixed-shape
TensorVariable
replaces a non-fixed-shapeTensorVariable
input and vice versa, because we might need to update something downstream. The type information itself would imply that the such a substitution isn't valid, at least not without assuming that the non-fixed-shape variables actually have the same fixed-shape (e.g. the fixed-shape types are refinement types and we're talking about something like substitutability).There might not be many/any scenarios in which this is a real concern, especially when/if all of the fixed-shape information has already been determined within the graph containing the in-place updated
Apply
node. Regardless, this is perhaps the main concern right now.The text was updated successfully, but these errors were encountered: