Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a fixed-length/shape TensorType and TensorVariable #431

Closed
brandonwillard opened this issue May 21, 2021 · 3 comments · Fixed by #711
Closed

Create a fixed-length/shape TensorType and TensorVariable #431

brandonwillard opened this issue May 21, 2021 · 3 comments · Fixed by #711
Labels
enhancement New feature or request graph objects important refactor This issue involves refactoring

Comments

@brandonwillard
Copy link
Member

brandonwillard commented May 21, 2021

We need to clean up the handling of fixed-shape tensors, and creating a fixed-size Type (TensorType, really) is perhaps the best way to do that.

Such a TensorType would allow us to replace/remove and generalize most—if not all—of the distinct Shape handling logic and the haphazard use of aesara.tensor.get_vector_length, among other things.

Likewise, it provides a good solution to #42/#93.

For anyone not intimately familiar with these parts of Aesara's internals, users deal with Variable objects (e.g. aesara.tensor.vector() returns a TensorVariable instance, which is a subclass of Variable), and each Variable has a Variable.type value that extends Type. Type instances are exactly what you'd guess they are: objects that emulate Python types and provide static domain-specific type information. For example, TensorTypes hold the data type (i.e. TensorType.dtype), and the number of dimensions/which dimensions are broadcastable (i.e. TensorType.broadcastable).

See #134 for more information about Aesara's types and how we might be able to simplify them by using actual Python types instead.

Background

If we were to implement a fixed-shape TensorType, and corresponding TensorVariable, the first question would be:
"How do we change the results of every Op.make_node so that they produce outputs that are fixed-shape TensorVariables when their inputs are fixed-shape?". This is the real challenge underlying the idea.

Without addressing this question, we could have fixed-shape TensorTypes/Variables, but they would only be useful to the first Ops they encounter. That's not necessarily a bad thing, because there are definitely a few Ops that could make good use of this (e.g. size and shape-based Op parameters). Even so, we can definitely do better.

The Shape Op, ShapeFeature, ShapeOptimizer, and their associated Op.infer_shape methods are Aesara's current means of doing all this, and we might only need to cleverly repurpose the latter to get what we want.

Proposed Changes

More specifically, Op.infer_shape is the means through which an Op propagates the "fixed-shape-edness" of its inputs through to its outputs. We could use these existing—albeit voluntary—implementations to determine whether or not an Op's output should be a fixed-shape TensorVariable or not, and, if so, propagate the sizes and construct one.

In general, Aesara is designed in a way that makes Python-based typing very difficult to use constructively, because each individual Op is left to determine the exact types of the outputs it generates. In nearly every instance, an Op will simply construct TensorVariables from scratch and use those as its outputs, making it all but impossible to utilize TensorVariable subclasses—like our proposed fixed-shape TensorVariable—without changing all existing Op.make_node implementations.

Instead, we could actually make Apply something functional and utilize it for these purposes (and perhaps more). At present, Apply is essentially just a container that performs no significant actions. For instance, when an Op.make_node returns the Apply nodes it constructs, almost nothing is done by Apply other than trivial input validation. We could make Apply a point of dispatch for iterative, type-based computations; that way, each Op isn't responsible for reasoning about things that are arguably outside of its scope.

In other words, when an Op is called with some inputs, Op.make_node constructs its outputs and creates an Apply node with said inputs and outputs, then the Apply node constructor (or Apply.__new__) calls the repurposed Op.infer_shape, which computes the concrete shapes of the outputs given the inputs, and creates updated fixed-shape outputs.

Issues

Computation

Such a change would move the somewhat elective shape inference process, which currently happens during optimization, into the model construction process. Since shape inference isn't a particularly computationally intensive thing, this isn't a real concern. Plus, we can always make shape propagation configurable, so there's no real need for a complete trade-off.

Alternatively, we might be able to make shape propagation a lazily computed process. This could be facilitated by changes to TensorVariable itself. For instance, instead of TensorVariable.shape returning the output of a Shape Op, it could return a fixed-length Sequence that, when accessed would lazily compute the shape values.

This could be a huge simplification relative to the current approach, because shape entries would be distinct objects. Under the current approach, graphs can contain numerous distinct Shape Ops, and *Subtensor* Ops on those Shapes, that all refer to the same shapes. Such a change would reduce the amount of merge-like work needed during the shape optimization process. It might also remove the need for some/all of the convoluted *Subtensor* considerations in ShapeFeature.

Regardless, these considerations are somewhat tangential to the present issue.

Unusual Ops

One issue with this approach is that an Op may for some reason want to hold on to the outputs it generates, and, if such an Op assumes that said outputs will be used down the line, it would be mistaken under these changes. This is almost exclusively an Op-specific implementation issue; one that can—however—be easily remedied by first creating the Apply node and then using its refined outputs.

In-place Apply updates

FunctionGraph.replace still operates by changing values in Apply.inputs. This would circumvent the proposed changes.

Under certain assumptions, this might not be a problem, though. For instance, if fixed-shape TensorVariables have already been produced, and their shapes have already been computed, then moving them around shouldn't matter, because, by their very type alone, no shape computations are needed (i.e. they carry their static shape information with them at all times).

Otherwise, if there's a shape conflict that arises from a rewrite, it's really the fault of the rewrite operation.

The real concern arises when a fixed-shape TensorVariable replaces a non-fixed-shape TensorVariable input and vice versa, because we might need to update something downstream. The type information itself would imply that the such a substitution isn't valid, at least not without assuming that the non-fixed-shape variables actually have the same fixed-shape (e.g. the fixed-shape types are refinement types and we're talking about something like substitutability).

There might not be many/any scenarios in which this is a real concern, especially when/if all of the fixed-shape information has already been determined within the graph containing the in-place updated Apply node. Regardless, this is perhaps the main concern right now.

@brandonwillard brandonwillard added enhancement New feature or request important refactor This issue involves refactoring labels May 21, 2021
@brandonwillard brandonwillard added this to the Clean up Theano milestone May 21, 2021
@brandonwillard brandonwillard changed the title Create a fixed-length/shape TensorType Create a fixed-length/shape TensorType and TensorVariable May 21, 2021
@brandonwillard brandonwillard removed this from the Clean up after Theano milestone Nov 25, 2021
@brandonwillard brandonwillard linked a pull request Dec 29, 2021 that will close this issue
5 tasks
@twiecki
Copy link
Contributor

twiecki commented Feb 5, 2022

Is this fixed?

@brandonwillard
Copy link
Member Author

Is this fixed?

This was implemented in the linked PR.

@ricardoV94
Copy link
Contributor

@twiecki There are some follow up PRs needed to make use of this across the codebase: #732 / #797

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request graph objects important refactor This issue involves refactoring
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants