statistical approach of decision when behavior is in learning process #160

hyunjimoon · 2023-08-23T11:31:00Z

hyunjimoon
Aug 23, 2023
Maintainer

//////////////////////////////////////////////////////////////////////
// Decisions, Decisions, Decisions
//////////////////////////////////////////////////////////////////////

// Consider a space of actions A. How do we construct a strategy that
// identifies an optimal decision?

// We need to quantify optimality with a utility function,
U : A -> R.
// We can then just take the action with the highest utility
a^dagger = argmax_{a \in A} U(a).

// In most cases utility functions will depend on behaviors we don't
// know. If those parmeters are modeled by the parameters theta \in Theta
// then we have a model-based utility function
U : A times Theta -> R.
// In this case we can no longer unambiguously rank the actions by their
// utilities because changing theta can change the rankings.

// To make a decision we need to infer the unknown behaviors from some
// data tilde{y}. We could use a point estimate
a^dagger(tilde{y}) = argmax_{a \in A} U(a, hat{theta}(tilde{y}))
// or even Bayesian inference
a^dagger(tilde{y}) = argmax_{a \in A} \int d(theta) pi(theta | tilde{y}) U(a, theta).

// Processes like these define data-dependent decision-making strategies
a^dagger: Y -> A.

// How well can we expect a decision-making strategy to work before we
// collect any data and hence before we can learn about theta?

// We don't know theta and we don't know what we will learn about theta
// because we don't know y. We need to consider what y we might see,
// what we would learn from that y, and then how good of a decision we
// would make would be. In other words we need to simulate.

// In a Bayesian analysis this is straightforward:
tilde{theta} ~ pi(theta)
tilde{y} ~ pi(y | tilde{theta})
a^dagger(tilde{y})
U(a^dagger(tilde{y}), tilde{theta})

// Note that this "calibration" procedure, also known as "fucking around
// and finding out", depends on
// 1. The choice of Bayesian model pi(y, theta) = pi(y | theta) pi(theta).
// 2. The choice of decision-making strategy a^dagger: Y -> A.
// 3. The choice of model-based utility function U : A times Theta -> R.

// If we know what kind of overall performance we want then we can tune
// any of these choices. Changing (1) is known as experimental design.
// Changing (2) is tuning the strategy. Changing (3) is setting realistic
// expectations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

statistical approach of decision when behavior is in learning process #160

{{title}}

Replies: 0 comments

Select a reply

statistical approach of decision when behavior is in learning process #160

hyunjimoon Aug 23, 2023 Maintainer

Replies: 0 comments

hyunjimoon
Aug 23, 2023
Maintainer