Address basic design and operational issues of `ShapeFeature` #749

brandonwillard · 2022-01-13T22:47:07Z

ShapeFeature has a few big redundancies in relation to the standard rewrite functionality and framework, and its logic is unnecessarily complicated for the functionality it provides. Its design also produces some avoidable limitations from which we're currently suffering—aside from ongoing maintenance headaches.

These issues can be observed together within a single motivating example:

ShapeFeature keeps a dict of graph Variables mapped to their inferred shapes (i.e. the outputs of Op.infer_shape). These shape graphs are not canonicalized until they're put in a graph (and a canonicalization pass is made). This complicates the use of results from ShapeFeature by, for example, requiring independent and wasteful canonicalization rewrites in order to reasonably compare or derive information from the shapes it tracks.

These issues are also relevant to #732.

Illustration of the Problem

A concrete example is ShapeFeature.same_shape. In #711, we had to add an explicit canonicalization step in order to make this functionality even remotely applicable outside of extremely trivial comparisons. These kinds isolated rewrites are highly undesirable because their results are not reusable and they will likely be repeated in multiple different places. (They're also the kind of rewrites we want to avoid in #732.)

More specifically, if ShapeFeature.same_shape is called, a canonicalization of two Variables' shapes is performed. Those results are discarded after ShapeFeature.same_shape is finished. Later, if ShapeFeature.same_shape is called again with one of the same Variable arguments, the same canonicalization is performed and discarded. Similarly, if one of those Variables is added to the "main" FunctionGraph being rewritten, the same canonicalization passes will be applied to the same subgraph once again.

It's important to realize that the context we're discussing technically involves two FunctionGraphs: the "main" FunctionGraph that is utilizing the ShapeFeature, and an "ephemeral" FunctionGraph used or created by—say—aesara.graph.opt_utils.optimize_graph to perform a one-off canonicalization of a graph. This is exactly the situation in ShapeFeature.same_shape.

When one remembers that all rewriting needs to be done in the context of a FunctionGraph, the main problem(s) arising from the concerns of this issue—and some big design problem(s) of ShapeFeature itself—become a little more apparent.

Possible Solution(s)

All the solutions I can think of right now involve the need for ShapeFeature to keep in-sync with the FunctionGraph it serves. Because ShapeFeature is curating its own set of graphs outside of its FunctionGraph, those graphs are ignored by all the canonicalization passes that would automatically solve our problem.

Also, notice that ShapeFeature has a lot of logic for performing updates on those graphs in response to updates from FunctionGraph. This situation—and the fact that its implementation is very opaque and unnecessarily difficult to follow—leads me to believe that the entire approach/idea of ShapeFeature might be fundamentally flawed.

Consider the (effective) functionality it currently provides:

replace Shape Ops with the tuple results of Op.infer_shape (when available) or generic Shape_i Ops,
caching those results, and
provide the ability to compare shapes of arbitrary Variables in the FunctionGraph (i.e. ShapeFeature.same_shape)

Its docstring shows that some of its previously intended functionality was fundamentally ill suited for a Feature: e.g. perform lifting rewrites and removing at.fill/seconds from a graph. Those two have already been obviated by standard rewrites in Aesara, and we should do the same for the remaining functionality.

As far as the remaining functionality is concerned, 1. can be easily replaced by a few simple local_optimizer rewrites, and 3. is already trivial to implement as a stand-alone function. It's 2. that might need some special considerations.

My first impression is that this kind of caching is the sole purpose of a ShapeFeature (if any). Unlike the current ShapeFeature, there should be no complicated synching required between the graphs in its cache and the FunctionGraph, and all rewrites applied to the FunctionGraph should seamlessly apply to the graphs in the cache. The latter is the only thing that may require new functionality, and that would possibly involve changes to nothing other than FunctionGraph.

If we were to add all the shape graphs tracked by a ShapeFeature as additional outputs in its FunctionGraph, every rewrite to FunctionGraph would be applied to those graphs and our main issue would be resolved. Of course we would need to prune/remove those additional outputs when/if their respective Variables are removed from the graph—otherwise we would be doing unnecessary work from there on—but that's definitely a suitable responsibility for a ShapeFeature.

While adding and removing new outputs to a FunctionGraph is technically supported right now, we wouldn't want to take this approach because it would alter the results expected by users of FunctionGraphs. In other words, there would be new, irrelevant, and unexpected FunctionGraph.outputs entries that other code would need to handle.

Instead, we could consider adding some type of "hidden" or "internal" outputs functionality to FunctionGraph. It would only need to preserve the original outputs in FunctionGraph.outputs and perhaps a few other interface-level results.

The text was updated successfully, but these errors were encountered:

brandonwillard · 2022-01-13T22:47:44Z

@kc611, tell me what you think about the proposed FeatureGraph-based solution.

kc611 · 2022-01-14T13:00:29Z

tell me what you think about the proposed FeatureGraph-based solution.

I assume you mean FunctionGraph, right?

As of now, I don't see any reason why it won't work. I mean yeah, we'd just have to make sure those outputs don't show up on a interface level. I guess the biggest issue with directly adding these nodes into the normal FunctionGraph will be (quickly) identifying which nodes/variables are 'normal' FunctionGraph nodes/variables while which of them are 'shape_based', cause essentially we'd be putting these nodes (and variables) along with the normal ones in FunctionGraph.variables and FunctionGraph.apply_nodes which in turn would drives all the FunctionGraph.client logic.

One question however, usually a shape variable (representing a shape of a variable) will have just one (or two) Ops above or below another shape variable in the graph, these Nodes will essentially be representing outputs since the shape of variable will usually depend on other within the same graph and this difference is usually represented by a single operation. Hence they'll need to be present in the graph as long as the variable is and can't get 'optimized out', won't that mean the shape graph will essentially have very few to no optimizations applied to it ? If so won't it be simpler to keep a mapping of variable to it's shape (or shape graph) and evaluating it on demand using the shapes already present within the Feature i.e. try making the Shape graphs smaller by representing them in terms of an operation done on shapes of other variables instead of calculating the shapes from bottom up for every variable ?

brandonwillard · 2022-01-15T21:06:10Z

I guess the biggest issue with directly adding these nodes into the normal FunctionGraph will be (quickly) identifying which nodes/variables are 'normal' FunctionGraph nodes/variables while which of them are 'shape_based', cause essentially we'd be putting these nodes (and variables) along with the normal ones in FunctionGraph.variables and FunctionGraph.apply_nodes which in turn would drives all the FunctionGraph.client logic.

Exactly; that would be real work behind this idea. I think the whole idea ultimately boils down to merging or nesting distinct FunctionGraphs. Perhaps approaching it directly that way would be better or easier.

One question however, usually a shape variable (representing a shape of a variable) will have just one (or two) Ops above or below another shape variable in the graph, these Nodes will essentially be representing outputs since the shape of variable will usually depend on other within the same graph and this difference is usually represented by a single operation. Hence they'll need to be present in the graph as long as the variable is and can't get 'optimized out', won't that mean the shape graph will essentially have very few to no optimizations applied to it ?

I think I'll need an example to understand this situation clearly; however, there shouldn't be any rewrites that completely remove a tracked shape graph. A tracked shape graph should only be removed when the underlying Variable is removed, because then the shape graph is completely unnecessary. This removal could possibly be done in a very simple way by leveraging—and perhaps extending a little—the existing FunctionGraph.clients mechanisms (e.g. making the hidden shape graphs "virtual" clients of their respective Variables).

Also, we shouldn't eagerly track every Variable's shape graph, only the ones that are used within the main graph or ones that are referenced by a rewrite via the Feature. In other words, we should do the tracking lazily.

If so won't it be simpler to keep a mapping of variable to it's shape (or shape graph) and evaluating it on demand using the shapes already present within the Feature i.e. try making the Shape graphs smaller by representing them in terms of an operation done on shapes of other variables instead of calculating the shapes from bottom up for every variable ?

Yes, we will always need to keep such a mapping. The only difference is that the mapping will now map Variables in the FunctionGraph to their shape graphs which also exist as "hidden" outputs in the same FunctionGraph.

Just to be clear, the "hidden" outputs thing isn't important; it's just a way to illustrate how we could track shapes that are seamlessly integrated into the FunctionGraphs they serve. In this scenario, more nodes can be merged, many replacements/updates wouldn't need to be handled by the Feature anymore, and canonicalizations would be consistently and automatically applied to the shape graphs.

brandonwillard added enhancement New feature or request help wanted Extra attention is needed important shape inference labels Jan 13, 2022

brandonwillard mentioned this issue Jan 14, 2022

Implement aesara.tensor.matmul #744

Merged

kc611 mentioned this issue Jan 25, 2022

Merge Scan's inner-graphs without graph traversal #777

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address basic design and operational issues of `ShapeFeature` #749

Address basic design and operational issues of `ShapeFeature` #749

brandonwillard commented Jan 13, 2022 •

edited

Loading

brandonwillard commented Jan 13, 2022

kc611 commented Jan 14, 2022 •

edited

Loading

brandonwillard commented Jan 15, 2022 •

edited

Loading

Address basic design and operational issues of ShapeFeature #749

Address basic design and operational issues of ShapeFeature #749

Comments

brandonwillard commented Jan 13, 2022 • edited Loading

Illustration of the Problem

Possible Solution(s)

brandonwillard commented Jan 13, 2022

kc611 commented Jan 14, 2022 • edited Loading

brandonwillard commented Jan 15, 2022 • edited Loading

Address basic design and operational issues of `ShapeFeature` #749

Address basic design and operational issues of `ShapeFeature` #749

brandonwillard commented Jan 13, 2022 •

edited

Loading

kc611 commented Jan 14, 2022 •

edited

Loading

brandonwillard commented Jan 15, 2022 •

edited

Loading