You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the synthetic control functionality is constrained to a single treatment unit. Clearly, having one treated unit is the minimum you could have for a working synthetic control solution. This still offers non-trivial functionality, and we have docs with a generic example with simulated data, and also for the effects of Brexit (the UK is the only treated unit).
However, there are many situations where you will have more than one treated unit. This could happen in many different domains, but it will be notable in marketing with geolift situations. We also have a docs page on geolift with a single treated geo. We also have a docs page on multi-cell geolift analysis where we have multiple treated geos. That docs page currently walks through an example of a pooled analysis approach where we simply take the average of the outcome variable across the treated geos and then proceed to model it as a single treated unit case of synthetic control. The alternative was to treat the geos as unpooled - in that case we simply run multiple independent single treated unit synthetic control analyses.
Why
This issue proposes that we add the ability to model multiple treated units (or geos). This is has a number of motivations:
it is a more general solution
it would allow a single modeling approach to geo testing (or any other multiple treatment unit situation)
it would allow the full flexibility from pooled and unpooled analysis approaches, but also newly, partially pooled analysis where there could be information sharing across weights.
So rather than dims="coeffs" (where coeffs correspond to control units), it would be dims=("control_units", "treated_units"). This would give us an unpooled set of weights of each of the control units for each of the treated units. A later step could them implement partial pooling over these weights (across the treated_unit) dimension.
The WeightedSumFitter.build_model method would also change to update the fact that the raw data would no longer be long form, so the incoming data (currently a design matrix X would now be a 2D matrix, probably shape ("time", "unit").
Changes to the SyntheticControl class
SyntheticControl would no longer inherit from the PrePostFit class. So all the logic currently in PrePostFit.__innit__ would move to the new SyntheticControl.__init__. This will leave InterruptedTimeSeries as the only class that does inherit from PrePostFit, so there would be opportunity to collapse that class hierarchy, but that is a peripheral issue. The core thing is that SyntheticControl would change a lot.
The incoming dataframe is still split into pre and post treatment
Remove the formula argument and no longer use a design matrix approach (with patsy). This would result in quite a lot of change to the logic in SyntheticControl.__init__
Update the _bayesian_plot method.
Changes to tests
Update all the integration tests to deal with the changed API
Add new tests to cover the new multiple treated unit case
Changes to docs
We'd have to update the docs to use the new API.
We would also want to update the existing multi-cell geolift analysis docs.
The text was updated successfully, but these errors were encountered:
What
Currently, the synthetic control functionality is constrained to a single treatment unit. Clearly, having one treated unit is the minimum you could have for a working synthetic control solution. This still offers non-trivial functionality, and we have docs with a generic example with simulated data, and also for the effects of Brexit (the UK is the only treated unit).
However, there are many situations where you will have more than one treated unit. This could happen in many different domains, but it will be notable in marketing with geolift situations. We also have a docs page on geolift with a single treated geo. We also have a docs page on multi-cell geolift analysis where we have multiple treated geos. That docs page currently walks through an example of a pooled analysis approach where we simply take the average of the outcome variable across the treated geos and then proceed to model it as a single treated unit case of synthetic control. The alternative was to treat the geos as unpooled - in that case we simply run multiple independent single treated unit synthetic control analyses.
Why
This issue proposes that we add the ability to model multiple treated units (or geos). This is has a number of motivations:
Changes
Changes to the
WeightedSumFitter
classThis pymc model class would need to be changed so that we have a weight matrix, rather than a weight vector.
CausalPy/causalpy/pymc_models.py
Lines 254 to 271 in 4227edf
So rather than
dims="coeffs"
(wherecoeffs
correspond to control units), it would bedims=("control_units", "treated_units")
. This would give us an unpooled set of weights of each of the control units for each of the treated units. A later step could them implement partial pooling over these weights (across thetreated_unit
) dimension.The
WeightedSumFitter.build_model
method would also change to update the fact that the raw data would no longer be long form, so the incoming data (currently a design matrixX
would now be a 2D matrix, probably shape("time", "unit")
.Changes to the
SyntheticControl
classSyntheticControl
would no longer inherit from thePrePostFit
class. So all the logic currently inPrePostFit.__innit__
would move to the newSyntheticControl.__init__
. This will leaveInterruptedTimeSeries
as the only class that does inherit fromPrePostFit
, so there would be opportunity to collapse that class hierarchy, but that is a peripheral issue. The core thing is thatSyntheticControl
would change a lot.formula
argument and no longer use a design matrix approach (with patsy). This would result in quite a lot of change to the logic inSyntheticControl.__init__
_bayesian_plot
method.Changes to tests
Changes to docs
The text was updated successfully, but these errors were encountered: