ExpAn - Statistical analysis of A/B tests in Python

The functions in the library are standalone and can be imported and used from within any project and from the command line.

Loading of data from various sources is not in scope of this library.

Overview

Assumptions used in analysis

Sample-size estimation:
- Treatment does not affect variance
- Variance in treatment and control is identical
- Mean of delta is normally distributed
Welch t-test:
- Mean of means is t-distributed (or normally distributed)
In general:
- Sample represents underlying population
- Entities are independent

Core Analysis Module

Main user stories:

As a Data Scientist I want to perform all the basic analysis routines that are typical of a the analysis of an A/B Test (a.k.a. Between-Subject Randomised Control Trial) while retaining access to the raw data so I can perform very also custom analyses in order to answer the questions of stakeholders with little effort.

As an analyst from a different department, I want to be able to bring my own data, and easily be able to use this library to perform analysis: in other words, as long as data is in a format compatible with expan.ExperimentData (documented below), importing it into the library and then performing analyses on it should be almost trivial.

Input Data (`expan.ExperimentData`)

Data to be analysed is loaded into the ExperimentData class.
Features and KPIs are stored separately (but are exposed as a single object 'metrics' by dynamically joining the two)
An ExperimentData therefore contains:
- 2 pandas DataFrame objects (kpis and features)
- a dictionary for metadata
- a property (metrics) which dynamically returns another DataFrame
- a set of functions and properties to simplify access to the data somewhat
Analysis functionality is provided on a subclass of this (expan.Experiment)

The tables below define the data structure of the individual parts within an ExperimentData object.

Underlined column names refer to indices; bold is any column or row name; and square brackets indicate [an].

Metadata

This is a dictionary of information describing the experiment to be analysed.

key	example value	explanation
experiment	"Generic Website Improvement"	Name of the experiment, as known to stakeholders. Can be anything meaningful to you.
[experiment_id]	"a9a9e987a9f99d3_2015-01-01T12:00:00.123"	This uniquely identifies the experiment. Could be a concatenation of the experiment name and the experiment start timestamp.
sources	["our_mysql","website_logs"]	Names of the data sources used in the preparation of this data.
baseline_variant	“No Change”	the variant against which all others will be measured.
[retrieval_time]	2015-10-21H18:28CEST	time that data was fetched from original sources... perhaps this should be a list with entry per source?
[primary_KPI]	"orders"	Overall Evaluation Criteria

KPIs

variant	entity	[time_since_treatment]	number of orders	PCII
A	ec0231efh	0	1	23.23
A	ec0231efh	1	2	250.32
B	f387534e2	0	0	-

Features

variant	entity	treatment start time	age	PCII_365
A	ec0231efh	2015-02-23H12:00CEST	32	932.92
B	f387534e2	2015-02-23H12:00CEST	65	23.44

Output Data (`expan.Results`)

The Results object is based on a single pandas DataFrame object. Currently it has-a DataFrame, but could in the future be implemented so that it is-a DataFrame.

Similar to the input data, Results have metadata (a dictionary) and a DataFrame.

Metadata

This is a dictionary describing the results, some of which is derived directly from the metadata of the input data, and some is additional.

key

example value

explanation

experiment

“Generic Website Improvement 2015-01-01”

see ExperimentData metadata

experiment_id

"a9a9e987a9f99d3_2015-01-01T12:00:00.123"

see ExperimentData metadata

retrieval_time

see above

retrieval time of the data sources

analysis_time

2015-10-21H18:28CEST

Time that the analysis was performed. not yet implemented

baseline_variant

“No Change”

Variant against which all results were computed.

primary_KPI

"PCII"

The KPI used for OEC (Overall Evaluation Criteria).

metric_units

orders	orders
sample_size	customers
net_sales	€
PCII	€
PCII_per_customer	€
Age	years

The underlying unit of each metric. not yet implemented Probably this can be combined with the full metric object?

cost_of_treatment

{'A': 1, 'B': 1.5}

Cost of treatment per variant as a dict used to offset the uplift.

expan_version

1.0.1

The version of expan that was used to compute the results.

[analysts]

["joe.bloggs@zalando.de"]

Identification of the data scientists running the analysis: probably email address is best here. Will be a list, but is optional.

Binning

The binning objects are stored as a dictionary of 'Binning' objects in the Results structure, indicating how the subgroups were created.

The bin associated with a subgroup in the results dataframe is referenced by the string label.

subgroup_metric

binning

label_format_str (going to deprecate this)

label_example (not actually in results)

Age

<Binning Object Created on Age data>

bin0: 0-19
bin1: 20-30
bin2: 30-99

'{lo},{hi}'

20-30

CLV

<Binning Object Created on CLV data>

'{standard}'

[102.0,144.5)

Result Data Frame (`.df`)

yellow statistics will probably be derived: calculated on the fly by properties rather than stored in the dataframe.

index						variant columns			subgroup columns (think about this)	comments (not in data)
metric	subgroup_metric	subgroup	time_since_treatment	statistic	pctile	“Bamboozle”	“Spektakulatrix”	"No Change"	subgroup_bin_index	comments (not in data)
PCII	Age	20-30	0	uplift	nan	3.2	3.5	0	0	the mean of the difference between variant and baseline (variant-baseline)
				uplift_rel	nan	16%	17.5%	0%		the uplift as proportion of baseline ((variant-baseline)/baseline) NB: probably won't be in the dataframe itself because it can be derived (so prob. implement as a property of the results class)
				sample_size	nan	10000	5000	1000		sample size of each variant
				uplift_pctile	2.5	-0.3	1.2	nan		percentiles of the difference between variant and baseline (so 95% confidence intervals are represented by the 2.5 and 97.5 percentiles
				uplift_pctile	97.5	7.8	7.4	nan
				uplift_pctile	4.3	0	0	nan		any percentile can be represented, including some special ones, like those associated with 0 uplift or uplift of exactly treatment cost.
				prob_uplift_over_0	nan	0.043		nan		could represent the probability of uplift being over 0 explicitly like this, equivalent to having the uplift_pctile statistic with a value of 0. Discussion is here (only internal to Zalando currently, sorry)
				prob_uplift_over_cost
				variant_mean	nan	23.2	23.5	20		simply the mean of the variant, including baseline
				pre_treatment_diff	nan	2.63	-1.23	0		feature check result for numerical variables
				pre_treatment_diff_pctile	2.5	-2.54	-2.46	-1.53		feature check result for numerical variables
				pre_treatment_diff_pctile	97.5	5.34	0.64	1.56		feature check result for numerical variables
				chi_square_p	nan	0.63	0.25	0.93		feature check result for categorical variables
			10	uplift	nan	5.2	22.1	0
				sample_size	nan	10000	5000	1000
				uplift_pctile	2.5	0.9	10.0	nan
				uplift_pctile	97.5	7.8	30.0	nan
				variant_mean	nan	27.2	44.1	22

'time_since_treatment' is currently only included if a trend analysis was done.
'-' is used as a sentinal for NaNs for index levels 'metric', 'subgroup_metric','subgroup' because all-nan index levels cause big problems with pandas reindexing etc.
- could think of dropping index levels if they are all nans - as time level is.
Variants are stored in first level of columns.
- Storing baseline_variant as a piece of metadata means we do not need a column for it, and we will most likely have no use case for combining results with different baseline variants. However, we store the baseline variant in the data as an explicit column because this will allow the same structure to be used for plotting the variants directly against each other, and allows for storing the absolute (within-variant) values as well as the uplift information in the same format.

Questions

Open

Answered

[Core] Should the feature and kpi data frames be combined?
- No, we will store them separately and combine them when needed with a join. This join can be cached, but keeping separate allows efficiency especially for time-dependent analysis where features do not change.
[Core] Should features and kpis be class objects (metric class as in Default Analyzer)?
- Attributes specific to individual metrics can be captured with dictionaries in the metadata where the metric name is the key to the dictionary (e.g. metadata['is_categorical']={'orders': False, 'gender': True}
[Core] Should the metadata and the core data frame be combined into one unified structure?
- No. Metadata is global to the whole dataframe, it does not apply to individual elements in it. Also, it should be very easily understood and able to be manipulated: analysts should be able to store extra stuff in there as they like.
[General] Connection to statistical monitoring module?
- That should be something dealt with in the Analysis Service

Glossary

Name	Definition	Example
Metric	Metric is the generic term covering KPI and Feature. It describes anything that can be measured on a per entity level.
KPI	Key Performance Indicator. Used here to describe the data measured after the start of the treatment. It is used to identify the variables which are expected to be influenced by the treatment.	PCII accumulated after treatment start is a typical KPI for customer based experiments.
Feature	A feature is data which is not expected to be influenced by the treatment. That includes but is not limited to all data that is known on an entity at the start of a treatment.	Age or gender are typical features in customer based experiments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExpAn-Description.mediawiki

ExpAn-Description.mediawiki

Table of Contents

ExpAn - Statistical analysis of A/B tests in Python

Overview

Assumptions used in analysis

Core Analysis Module

Input Data (`expan.ExperimentData`)

Metadata

KPIs

Features

Output Data (`expan.Results`)

Metadata

Binning

Result Data Frame (`.df`)

Questions

Open

Answered

Glossary

Files

ExpAn-Description.mediawiki

Latest commit

History

ExpAn-Description.mediawiki

File metadata and controls

Table of Contents

ExpAn - Statistical analysis of A/B tests in Python

Overview

Assumptions used in analysis

Core Analysis Module

Input Data (expan.ExperimentData)

Metadata

KPIs

Features

Output Data (expan.Results)

Metadata

Binning

Result Data Frame (.df)

Questions

Open

Answered

Glossary

Input Data (`expan.ExperimentData`)

Output Data (`expan.Results`)

Result Data Frame (`.df`)