Refactor/replace/remove `aesara.tensor.basic.get_scalar_constant_value` #96

brandonwillard · 2020-10-10T23:14:19Z

aesara.tensor.get_scalar_constant_value is a utility function that contains unreasonably long conditional statements and introduces more unnecessary cross-module—and ultimately circular—dependencies (e.g. aesara.tensor.subtensor.Subtensor.get_constant_idx calls aesara.tensor.get_scalar_constant_value and vice-versa).

Let's fix this situation by moving this utility function into its own module and/or reimplementing it using some form of dispatch (e.g. single-dispatch, if possible). With the latter, the Op-specific parts of get_scalar_constant_value can be defined within each Op's own module and utilized only when relevant (i.e. when/if the Op itself is imported) using the existing constant folding capabilities—perhaps with some additional features (e.g. more fine-grained folding conditions).

The text was updated successfully, but these errors were encountered:

brandonwillard · 2020-10-28T00:59:37Z

Here's some more info on the situation with get_scalar_constant_value:

The function theano.tensor.basic.get_scalar_constant_value is supposed to work like numpy.ndarray.item and return the scalar constant value from a symbolic scalar or tensor with only one entry. It only works when the graph is comprised of certain Ops with constant inputs. Unfortunately, its implementation is extremely limited and inconsistent—even for the Ops it's supposed to support.

For example, the following provides some simple cases in which the Subtensor Op isn't handled correctly:

import theano.tensor as tt


test_array = tt.as_tensor([[1], [4]])

test_val = test_array.shape[-1]  # i.e. 1
tt.get_scalar_constant_value(test_val)  # works

test_val = test_array.shape[0]  # i.e. 2
tt.get_scalar_constant_value(test_val)  # doesn't work

test_val = test_array.shape[:-1]  # i.e. [2]
tt.get_scalar_constant_value(test_val)  # doesn't work

Now, we could fix these specific cases, but the entire idea behind get_scalar_constant_value is flawed, since it's essentially just constant folding. The code inside get_scalar_constant_value walks the graphs manually and performs the same checks as theano.tensor.opt.topo_constant_folding, except the latter uses the Ops own code instead of duplicating it like get_scalar_constant_value does.

Here's a simple demonstration for the last example above:

import theano


test_val = test_array.shape[:-1]

fgraph = theano.gof.fg.FunctionGraph(theano.gof.graph.inputs([test_val]), [test_val], clone=True)
res = tt.opt.topo_constant_folding.optimize(fgraph)

>>> tt_dprint(fgraph)
TensorConstant{(1,) of 2} [id A]

The only downsides to this approach (that I can think of right now) involve the overhead for cloning and possibly some cases of overly cautious Ops that don't allow constant folding (per their Op.do_constant_folding methods). The latter can be fixed via an override, and the former could be addressed by broader improvements to the graph and optimization framework.

First off, graph cloning should be very cheap, and, if it isn't, then we need to fix that ASAP—regardless of get_scalar_constant_value. Furthermore, we could—and probably should—clone the graphs "lazily" (i.e. on-demand during traversal; see #362). Something like this (or a more functional approach in general) would reduce unnecessary sub-graph cloning when the optimization stops/aborts. Speaking of which, it seems like this particular use of theano.tensor.opt.constant_folding should abort immediately when it finds a non-foldable sub-graph.

kc611 · 2022-01-12T18:23:29Z

Took a stab at this and came up with this

def get_constant_value(
    v, scalar=True, *args, **kwargs
):
    """Return the constant value underlying variable `v`.

    If `v` is the output of dimshuffles, fills, allocs, rebroadcasts,
    cast, OutputGuard, DeepCopyOp, ScalarFromTensor, ScalarOp, Elemwise
    and some pattern with Subtensor, this function digs through them.

    If `v` is not some view of constant data, then raise a
    NotConstantError. If specified `v` as scalar, this will raise
    a NonScalarConstantError

    Parameters
    ----------
    v: Variable
        Variable to be evaluated
    scalar: bool
        Specify if the returned value should be a scalar

    """

    if v is None:
        raise NotConstantError()

    v = as_tensor_variable(v)
    if not isinstance(v, Constant) and v.owner is not None:
        from aesara.graph.opt_utils import optimize_graph
        v_fgraph_clone = FunctionGraph([*graph_inputs([v])], [v], clone=True)
        optimize_graph(v_fgraph_clone)
        v = v_fgraph_clone.outputs[0]
    
    if not isinstance(v, Constant):
        raise NotConstantError()
    elif (scalar and v.ndim != 0):
        raise NotScalarConstantError()
    else:
        unique_value = get_unique_value(v)
        if unique_value is not None:
            v = unique_value
        else:
            v = v.data
        return v

The issue with using 'only' topo_constant_folding.optimize(fgraph) for optimizing the FunctionGraphs clone is that it will probably be unable to do constant folding on its own for cases in which topo_constant_folding optimizes the graph in conjunction with other optimizations

For instance:

from aesara.tensor.basic_opt import topo_constant_folding
from aesara.graph.fg import FunctionGraph
from aesara.tensor.type import iscalar
import aesara.tensor.basic as at
import aesara
from aesara.graph.basic import graph_inputs

def get_constant_value(v):
    v = at.as_tensor_variable(v)
    v_fgraph_clone = FunctionGraph([*graph_inputs([v])], [v], clone=True)
    topo_constant_folding.optimize(v_fgraph_clone)
    v = v_fgraph_clone.outputs[0]
    return v

b = iscalar()
a = at.stack([b, 2, 3])
aesara.dprint(get_constant_value(a[1]))
# Isn't able to convert it into TensorConstant{2} by itself
# Subtensor{int64} [id A] ''   
#  |MakeVector{dtype='int32'} [id B] ''   
#  | |<TensorType(int32, scalar)> [id C]
#  | |TensorConstant{2} [id D]
#  | |TensorConstant{3} [id E]
#  |ScalarConstant{1} [id F]

I do agree that calling the entire set of optimizations i.e. optimize_graph(v_fgraph_clone) might be unnecessary but we definitely need a certain 'set' of optimizations along with topo_constant_folding if we are to go ahead with this approach.

brandonwillard · 2022-01-12T20:09:31Z

For instance:

from aesara.tensor.basic_opt import topo_constant_folding
from aesara.graph.fg import FunctionGraph
from aesara.tensor.type import iscalar
import aesara.tensor.basic as at
import aesara
from aesara.graph.basic import graph_inputs

def get_constant_value(v):
    v = at.as_tensor_variable(v)
    v_fgraph_clone = FunctionGraph([*graph_inputs([v])], [v], clone=True)
    topo_constant_folding.optimize(v_fgraph_clone)
    v = v_fgraph_clone.outputs[0]
    return v

b = iscalar()
a = at.stack([b, 2, 3])
aesara.dprint(get_constant_value(a[1]))
# Isn't able to convert it into TensorConstant{2} by itself
# Subtensor{int64} [id A] ''   
#  |MakeVector{dtype='int32'} [id B] ''   
#  | |<TensorType(int32, scalar)> [id C]
#  | |TensorConstant{2} [id D]
#  | |TensorConstant{3} [id E]
#  |ScalarConstant{1} [id F]

I do agree that calling the entire set of optimizations i.e. optimize_graph(v_fgraph_clone) might be unnecessary but we definitely need a certain 'set' of optimizations along with topo_constant_folding if we are to go ahead with this approach.

That's a great example, and it could also be telling us that perhaps we need to extend/improve our notion (and implementation) of constant folding.

I say this because there's a weird dependency chain in this situation: the non-constant folding rewrite that performs that Subtensor simplification needs to determine whether or not the appropriate inputs are constant (e.g. the indices), which means that it could be/is dependent on something like get_constant_value, which is—in turn—dependent on constant folding (or needs to employ some form/subset of it, at least).

Instead, it might make more sense to consider such a rewrite as a type of constant folding and somehow incorporate it into that framework, then there's no drawn out dependency chain.

Aside from that consideration, I think you have the correct overall approach, but that we need to start working out a way to make these rewrites-within-rewrites more efficient. This is also a rising issue for our future Op automation and shape inference prospects (see #732).

If we could continue to use in-place updates in these situations, that would be great, but we really can't circumvent the rewrite framework like that without incurring some problems/challenges.

brandonwillard · 2022-01-12T20:55:52Z

Actually, I'm starting to wonder how much we even need get_scalar_constant_value anymore. There might be a general design principle that—if we follow sufficiently—completely obviates it.

We've updated and added a lot of missing canonicalizations recently, and most—if not all—of the steps performed by get_scalar_constant_value are canonicalizations.

If you think of it that way, using get_scalar_constant_value within a rewrite is simply and attempt to preempt those canonicalization steps, but I'm not sure that's useful at all, because it's not guaranteed that by preempting them in any given rewrite you're likely to avoid needing to run those same canonicalization steps later. In other words, if rewrites simply wait for the canonicalizations to run and reduce eligible subgraphs to constants, they can achieve the same results and avoid walking the graph and folding constants that will need to be walked and folded anyway.

The only instances where that's not true is when a rewrite uses get_scalar_constant_value on a subgraph before the canonicalizations are applied to it and also happens to entirely remove said subgraph.

kc611 · 2022-01-15T13:44:11Z

We can just keep the above get_constant_value as a helper function while removing all it's occurrences in graph optimizers.

brandonwillard · 2022-01-15T19:58:22Z

We can just keep the above get_constant_value as a helper function while removing all it's occurrences in graph optimizers.

Yeah, that's likely the best course of action.

kc611 · 2022-02-21T17:11:12Z

@brandonwillard Can you describe the cache based memoization approach you proposed in the meet for this particular functionality.

I remember it being something like to be able to call certain constant folding optimizations at proper 'checkpoint's do=uring graph optimizations.

brandonwillard · 2022-02-22T19:58:48Z

@brandonwillard Can you describe the cache based memoization approach you proposed in the meet for this particular functionality.

I remember it being something like to be able to call certain constant folding optimizations at proper 'checkpoint's do=uring graph optimizations.

I was really just thinking of using something like functools.lrucache. We would need to create a single entrance point to our constant folding and use that everywhere; otherwise, if we used a cache for only only constant folding that occurs outside of the rewrite passes, we would still end up re-constant folding some sub-graphs when the rewrite passes are applied and that cache isn't used.

The question is really about how we can effectively use caching in these cases given that Apply nodes are updated in-place. If we assume that constant sub-graphs are never replaced with non-constant sub-graphs, then we're fine.

brandonwillard added enhancement New feature or request help wanted Extra attention is needed refactor This issue involves refactoring labels Oct 10, 2020

brandonwillard added this to the Clean up Theano milestone Oct 10, 2020

brandonwillard changed the title ~~Refactor theano.tensor.get_scalar_constant_value~~ Refactor/replace/remove theano.tensor.basic.get_scalar_constant_value Oct 28, 2020

brandonwillard added important graph rewriting labels Oct 28, 2020

michaelosthege modified the milestones: Clean up Theano, 2.0 Dec 15, 2020

brandonwillard mentioned this issue Jan 13, 2021

Refactor tensor imports #268

Merged

17 tasks

This was referenced Feb 14, 2021

Convert as_tensor_variable to a singledispatch function #302

Merged

Make sparse tensor types extend the existing tensor types #303

Closed

brandonwillard mentioned this issue Jul 11, 2021

Investigate ShapeFeature's handling of broadcast_shape_iter results #517

Open

brandonwillard changed the title ~~Refactor/replace/remove theano.tensor.basic.get_scalar_constant_value~~ Refactor/replace/remove aesara.tensor.basic.get_scalar_constant_value Jan 12, 2022

brandonwillard modified the milestones: 2.x, Clean up after Theano Jan 12, 2022

kc611 linked a pull request Jan 16, 2022 that will close this issue

Refactored get_scalar_constant_value to get_constant_value #755

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor/replace/remove `aesara.tensor.basic.get_scalar_constant_value` #96

Refactor/replace/remove `aesara.tensor.basic.get_scalar_constant_value` #96

brandonwillard commented Oct 10, 2020 •

edited

Loading

brandonwillard commented Oct 28, 2020 •

edited

Loading

kc611 commented Jan 12, 2022

brandonwillard commented Jan 12, 2022

brandonwillard commented Jan 12, 2022 •

edited

Loading

kc611 commented Jan 15, 2022

brandonwillard commented Jan 15, 2022

kc611 commented Feb 21, 2022

brandonwillard commented Feb 22, 2022 •

edited

Loading

Refactor/replace/remove aesara.tensor.basic.get_scalar_constant_value #96

Refactor/replace/remove aesara.tensor.basic.get_scalar_constant_value #96

Comments

brandonwillard commented Oct 10, 2020 • edited Loading

brandonwillard commented Oct 28, 2020 • edited Loading

kc611 commented Jan 12, 2022

brandonwillard commented Jan 12, 2022

brandonwillard commented Jan 12, 2022 • edited Loading

kc611 commented Jan 15, 2022

brandonwillard commented Jan 15, 2022

kc611 commented Feb 21, 2022

brandonwillard commented Feb 22, 2022 • edited Loading

Refactor/replace/remove `aesara.tensor.basic.get_scalar_constant_value` #96

Refactor/replace/remove `aesara.tensor.basic.get_scalar_constant_value` #96

brandonwillard commented Oct 10, 2020 •

edited

Loading

brandonwillard commented Oct 28, 2020 •

edited

Loading

brandonwillard commented Jan 12, 2022 •

edited

Loading

brandonwillard commented Feb 22, 2022 •

edited

Loading