The following notes are based on the blog post "Common Probability Distributions: The Data Scientist’s Crib Sheet" by Sean Owen (@sean_r_owen) on the cloudera engineering blog.
Bernoulli:
- model 1 "coin toss", only two possible outcomes (heads/tails)
- $ p $ probability of success and $ 1 - p $
Uniform:
- e.g. rolling fair die → all outcomes equally likely*
Binomial: → "with replacement"
-
multiple Bernoulli experiments → e.g. multiple coin tosses → "With $n$ tosses: How many times does it come up heads?" or "Drawing $ n $ balls (black/white) with replacement from an urn. How many are black?"
-
$n$ number of trials,$p$ probability of success -
each trail is independent and has same probability of success
Hypergeometric: → "without replacement"
- Drawing $ n $ balls from urn without replacement → probabilities not independent
- should come to mind when picking out a significant subset of a population as a sample.
- if number of balls relatively large to number of draws $ n $ → binomial & hypergeometric become similar (chances change less with each draw)
-
count events over a time given the continuous rate of events occurring --> heading towards infinitesimally small time slices in which the probability of a event is infinitesimal --> limit results in Poisson
-
$ \lambda $ - average rate, NOT by $ n $ and $ p $ (in simple analogy $ \lambda = np $)
-
queue processes: packets arrive at router, customers calling, ...