rendered changes, see previous commit message

cxzhang4 · Mar 15, 2024 · 4cc1f81 · 4cc1f81
1 parent b57acfb
commit 4cc1f81
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/_freeze/posts/probability_integral_transform/index/execute-results/html.json b/_freeze/posts/probability_integral_transform/index/execute-results/html.json
@@ -1,7 +1,7 @@
 {
-  "hash": "d3920c82582f70e466f5dd4c621aeb0c",
+  "hash": "abb2e136fdc6cb94a623033e95d60e15",
   "result": {
-    "markdown": "---\ntitle: \"Probability integral transform\"\ndescription: \"The probability integral transform states that, for a continuous random variable $X$, the distribution of $Y = F_X(X)$ is $U(0, 1)$. I give some intuition for this statement.\"\nauthor: \"Carson Zhang\"\ndate: \"12/04/2023\"\ndraft: false\n---\n\n\nThe probability integral transform states that, for a continuous random variable $X$, the distribution of $Y = F_X(X)$ is $U(0, 1)$. This result underlies inverse transform sampling. It illustrates why p-values are uniformly distributed under the null hypothesis. It is central to how copulas can model joint distributions. But why does this make sense?\n\nSuppose we have a random variable $X$ from an arbitrary probability distribution.\n\nHere, $X \\sim \\text{Beta}(\\alpha = 0.9, \\beta = 3.4)$\n\n\n::: {.cell}\n\n```{.r .cell-code}\nalpha_x = 0.9\nbeta_x = 3.4\nx_seq <- seq(0, 1, length = 100)\nx_density <- dbeta(x_seq, alpha_x, beta_x)\n\nplot(x_seq, x_density, type = \"l\", lty = 1,\n     xlab = \"X\", ylab = \"Density\", main = \"Density of X\")\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-1-1.png){width=672}\n:::\n:::\n\n\nWhat does $Y = F_X(x)$ look like?\n\nLet's try to draw the pdf of $Y$ one section at a time.\n\nFirst, suppose we select the top 3% of the distribution. [(This comprises the values between the $0.97$ and $1$ $p$-quantiles of this distribution.)](https://en.wikipedia.org/wiki/Quantile)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nquantile_0.97 <- qbeta(0.97, alpha_x, beta_x)\nquantile_1 <- 1\n\nplot(x_seq, x_density, type = \"l\", lty = 1,\n     xlab = \"X\", ylab = \"Density\", main = \"Density of X\")\nabline(v = c(quantile_0.97, quantile_1), col = \"orange\")\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-2-1.png){width=672}\n:::\n:::\n\n\nThe orange lines bound the top 3%.\n\nFor now, since we don't know what the density of $Y = F_X(X)$ looks like, let's say it's an arbitrary curve.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-3-1.png){width=672}\n:::\n:::\n\n\nHowever, recall that we selected the top 3% of the probability mass, so within the orange interval, the area under the curve must be $0.03$, and therefore the value of the pdf must be $1$ on average within the orange interval.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-4-1.png){width=672}\n:::\n:::\n\n\nNow, think about the region between the $0.97$ and $0.98$ $p$-quantiles of the distribution By definition, this comprises 1% of the probability mass ($0.98 - 0.97 = 0.01$), so we need to adjust our curve to satisfy this condition.\n\nHowever, we note that all intervals have this same property (even arbitrarily small intervals): **the width of each interval is equal to its corresponding probability mass.** So, the pdf of $Y$ needs to have mean $1$ over any sub-interval of $[0, 1]$.\n\nIt is natural for me to suspect the pdf of $Y$ to be a horizontal line at $1$: this is the only function I can think of that guarantees this property.\n\nFor extra reading and formalism, we use the above insight to illustrate the theorem.\n\n**Theorem (Probability Integral Transform):** $Y = F_X(X) \\sim \\text{Uniform}(0, 1)$.\n\n**Derivation of the pdf $f_Y(Y)$**[^1]: A \"proof\" that the pdf of $Y = F_X(X)$ is $1$, starting from the insight given above.\n\nLet $a, b$ be real numbers such that $0 \\leq a < b \\leq 1$.\n\nBy the argument above, we have $F_Y(b) - F_Y(a) = b - a$.\n\n(Note that we can rewrite this as $F(b) = b \\text{ and } F(a) = a$, i.e. $F$ is the identity.)\n\nWe have:\n\n$$\n\\begin{align}\n  b - a &= F_Y(b) - F_Y(a)\\\\\n    &= F_Y(Y) \\Big|_a^b && \\text{(standard antiderivative notation)}\\\\\n    &= \\int_a^b f_Y(y)dy && \\text{(definition of a probability density function)}\\\\\n    &= \\int_a^b 1dy && \\text{(a function with the identity as its antiderivative)}\\\\\n\\end{align}\n$$\n\nSo, we have $f_Y(y) = 1$, and therefore, $Y = F_X(x)$ has the standard uniform distribution.\n\n**Proof**: the standard proof of the PIT found on the [Wikipedia page](https://en.wikipedia.org/wiki/Probability_integral_transform).\n\n$$\n\\begin{align}\n  F_Y(y) &= P(Y \\leq y)\\\\\n    &= P(F_X(X) \\leq y) && \\text{(substituted the definition of } Y)\\\\\n    &= P(X \\leq F_X^{-1}(y)) && \\text{(applied } F_X^{-1} \\text{ to both sides)}\\\\\n    &= F_X(F_X^{-1}(y)) && \\text{(the definition of a CDF)}\\\\\n    &= y\n\\end{align}\n$$\n\nTherefore, $Y \\sim U(0, 1)$.\n\n## P-value distribution under $H_0$ [^2]\n\nThe p-value of a test statistic $T(X)$ for a one-sided test where the alternative \"is greater than\" is\n$P_{H_0}(T \\geq t(x))$.\n\nDefine $P_{greater} := \\Pr_{H_0}(T \\geq t(x)) = 1 - F_{T; H_0}(T)$.\n\n\n$$\n\\begin{align}\nF_{P_{\\text{greater}}} &= \\Pr(P_{greater} \\leq p)\\\\\n  &= \\Pr((1 - F_{T; H_0}(T)) \\leq p)\\\\\n  &= \\Pr(-F_{T; H_0}(T) \\leq (p - 1))\\\\\n  &= \\Pr(F_{T; H_0}(T) \\geq (1 - p))\\\\\n  &= 1 - \\Pr(F_{T; H_0}(T) \\leq (1 - p))\\\\\n  &= 1 - \\Pr(T \\leq F_{T; H_0}^{-1}(1 - p))\\\\\n  &= 1 - F_{T; H_0}(F_{T; H_0}^{-1}(1 - p))\\\\\n  &= 1 - (1 - p)\\\\\n  &= p\\\\\n  &= F_{U(0, 1)}\n\\end{align}\n$$\n\nTherefore, we have shown that one-sided p-values are uniformly distributed under the null hypothesis.[^3]\n\n## Acknowledgements\n\nThank you to Meimingwei Li, Raphael Rehms, Prof. Michael Schomaker, and J.P. Weideman for their helpful input.\n\n[^1]: This is not necessary: once we know $F_Y(Y)$, we know the distribution of $Y$. I'm also not convinced that this is a rigorous derivation. I still found it instructive to work through these steps.\n\n[^2]: Notation and \"less than\" proof from Raphael Rehms's exercise and solution from the Statistical Methods in Epidemiology course.\n\n[^3]: [This holds only for divergence p-values, not decision p-values](https://arxiv.org/abs/2301.02478). My understanding is that divergence p-values are exactly one-sided p-values. Thanks to Prof. Schomaker for this insight.",
+    "markdown": "---\ntitle: \"Probability integral transform\"\ndescription: \"The probability integral transform states that, for a continuous random variable $X$, the distribution of $Y = F_X(X)$ is $U(0, 1)$. I give some intuition for this statement.\"\nauthor: \"Carson Zhang\"\ndate: \"12/04/2023\"\ndraft: false\n---\n\n\nThe probability integral transform states that, for a continuous random variable $X$, the distribution of $Y = F_X(X)$ is $U(0, 1)$. This result underlies inverse transform sampling. It illustrates why p-values are uniformly distributed under the null hypothesis. It is central to how copulas can model joint distributions. But why does this make sense?\n\nSuppose we have a random variable $X$ from an arbitrary probability distribution.\n\nHere, $X \\sim \\text{Beta}(\\alpha = 0.9, \\beta = 3.4)$\n\n\n::: {.cell}\n\n```{.r .cell-code}\nalpha_x = 0.9\nbeta_x = 3.4\nx_seq <- seq(0, 1, length = 100)\nx_density <- dbeta(x_seq, alpha_x, beta_x)\n\nplot(x_seq, x_density, type = \"l\", lty = 1,\n     xlab = \"X\", ylab = \"Density\", main = \"Density of X\")\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-1-1.png){width=672}\n:::\n:::\n\n\nWhat does $Y = F_X(x)$ look like?\n\nLet's try to draw the pdf of $Y$ one section at a time.\n\nFirst, suppose we select the top 3% of the distribution. [(This comprises the values between the $0.97$ and $1$ $p$-quantiles of this distribution.)](https://en.wikipedia.org/wiki/Quantile)\n\n\n::: {.cell}\n\n```{.r .cell-code}\nquantile_0.97 <- qbeta(0.97, alpha_x, beta_x)\nquantile_1 <- 1\n\nplot(x_seq, x_density, type = \"l\", lty = 1,\n     xlab = \"X\", ylab = \"Density\", main = \"Density of X\")\nabline(v = c(quantile_0.97, quantile_1), col = \"orange\")\n```\n\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-2-1.png){width=672}\n:::\n:::\n\n\nThe orange lines bound the top 3%.\n\nFor now, since we don't know what the density of $Y = F_X(X)$ looks like, let's say it's an arbitrary curve.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-3-1.png){width=672}\n:::\n:::\n\n\nHowever, recall that we selected the top 3% of the probability mass, so within the orange interval, the area under the curve must be $0.03$, and therefore the value of the pdf must be $1$ on average within the orange interval.\n\n\n::: {.cell}\n::: {.cell-output-display}\n![](index_files/figure-html/unnamed-chunk-4-1.png){width=672}\n:::\n:::\n\n\nNow, think about the region between the $0.97$ and $0.98$ $p$-quantiles of the distribution By definition, this comprises 1% of the probability mass ($0.98 - 0.97 = 0.01$), so we need to adjust our curve to satisfy this condition.\n\nHowever, we note that all intervals have this same property (even arbitrarily small intervals): **the width of each interval is equal to its corresponding probability mass.** So, the pdf of $Y$ needs to have mean $1$ over any sub-interval of $[0, 1]$.\n\nIt is natural for me to suspect the pdf of $Y$ to be a horizontal line at $1$: this is the only function I can think of that guarantees this property.\n\nFor extra reading and formalism, we use the above insight to illustrate the theorem.\n\n**Theorem (Probability Integral Transform):** $Y = F_X(X) \\sim \\text{Uniform}(0, 1)$.\n\n**Derivation of the pdf $f_Y(Y)$**[^1]: A \"proof\" that the pdf of $Y = F_X(X)$ is $1$, starting from the insight given above.\n\nLet $a, b$ be real numbers such that $0 \\leq a < b \\leq 1$.\n\nBy the argument above, we have $F_Y(b) - F_Y(a) = b - a$.\n\n(Note that we can rewrite this as $F(b) = b \\text{ and } F(a) = a$, i.e. $F$ is the identity.)\n\nWe have:\n\n$$\n\\begin{align}\n  b - a &= F_Y(b) - F_Y(a)\\\\\n    &= F_Y(Y) \\Big|_a^b && \\text{(standard antiderivative notation)}\\\\\n    &= \\int_a^b f_Y(y)dy && \\text{(definition of a probability density function)}\\\\\n    &= \\int_a^b 1dy && \\text{(a function with the identity as its antiderivative)}\\\\\n\\end{align}\n$$\n\nSo, we have $f_Y(y) = 1$, and therefore, $Y = F_X(x)$ has the standard uniform distribution.\n\n**Proof**: the standard proof of the PIT found on the [Wikipedia page](https://en.wikipedia.org/wiki/Probability_integral_transform).\n\n$$\n\\begin{align}\n  F_Y(y) &= P(Y \\leq y)\\\\\n    &= P(F_X(X) \\leq y) && \\text{(substituted the definition of } Y)\\\\\n    &= P(X \\leq F_X^{-1}(y)) && \\text{(applied } F_X^{-1} \\text{ to both sides)}\\\\\n    &= F_X(F_X^{-1}(y)) && \\text{(the definition of a CDF)}\\\\\n    &= y\n\\end{align}\n$$\n\nTherefore, $Y \\sim U(0, 1)$.\n\n## P-value distribution under $H_0$ [^2]\n\nThe p-value of a test statistic $T(X)$ for a one-sided test where the alternative \"is greater than\" is\n$P_{H_0}(T \\geq t(x))$.\n\nDefine $P_{greater} := \\Pr_{H_0}(T \\geq t(x)) = 1 - F_{T; H_0}(T)$.\n\n\n$$\n\\begin{align}\nF_{P_{\\text{greater}}} &= \\Pr(P_{greater} \\leq p) && \\text{(definition of a CDF)}\\\\\n  &= \\Pr((1 - F_{T; H_0}(T)) \\leq p)\\\\\n  &= \\Pr(-F_{T; H_0}(T) \\leq (p - 1))\\\\\n  &= \\Pr(F_{T; H_0}(T) \\geq (1 - p))\\\\\n  &= 1 - \\Pr(F_{T; H_0}(T) \\leq (1 - p))\\\\\n  &= 1 - \\Pr(T \\leq F_{T; H_0}^{-1}(1 - p)) && \\text{(applied } F_X^{-1} \\text{ to both sides)}\\\\\n  &= 1 - F_{T; H_0}(F_{T; H_0}^{-1}(1 - p)) && \\text{(definition of a CDF)}\\\\\n  &= 1 - (1 - p)\\\\\n  &= p\\\\\n  &= F_{U(0, 1)}\n\\end{align}\n$$\n\nTherefore, we have shown that one-sided p-values are uniformly distributed under the null hypothesis.[^3]\n\n## Acknowledgements\n\nThank you to Meimingwei Li, Raphael Rehms, Prof. Michael Schomaker, and J.P. Weideman for their helpful input.\n\n[^1]: This is not necessary: once we know $F_Y(Y)$, we know the distribution of $Y$. I'm also not convinced that this is a rigorous derivation. I still found it instructive to work through these steps.\n\n[^2]: Notation and \"less than\" proof from Raphael Rehms's exercise and solution from the Statistical Methods in Epidemiology course.\n\n[^3]: [This holds only for divergence p-values, not decision p-values](https://arxiv.org/abs/2301.02478). My understanding is that divergence p-values are exactly one-sided p-values. Thanks to Prof. Schomaker for this insight.",
     "supporting": [
       "index_files"
     ],