Skip to content

maximum entropy PS #77

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
BERENZ opened this issue Apr 17, 2025 · 0 comments
Open

maximum entropy PS #77

BERENZ opened this issue Apr 17, 2025 · 0 comments

Comments

@BERENZ
Copy link
Contributor

BERENZ commented Apr 17, 2025

Initial code from @bartosz-bogulas on the maximum entropy propensity score as described in Chapter 11 in Kim, J. K., & Shao, J. (2021). Statistical methods for handling incomplete data. Chapman and Hall/CRC link.

#' Maximum Entropy Propensity Score Estimator
#'
#' @description
#' Fits a maximum entropy propensity score estimator for non-probability surveys
#' based on a density ratio model. The estimator is obtained by solving the
#' moment condition:
#' \deqn{\sum_{i=1}^{N} \delta_i\exp(\phi_0+\phi_1^\top x_i)(x_i-\hat{\bar{x}}_0)=0,}
#' where
#' \deqn{\hat{\bar{x}}_0=\frac{1}{\hat{N}_0}\Biggl(\sum_{i \in A} d_i\,x_i-\sum_{i=1}^{N}\delta_i\,x_i\Biggr), \quad \hat{N}_0=\sum_{i \in A} d_i-N_1,}
#' and \(N_1=\sum_{i=1}^{N} \delta_i\).
#'
#' The intercept \(\phi_0\) is determined by
#' \deqn{\phi_0=\log(N_1)-\log\Bigl\{\sum_{i=1}^{N}\delta_i\exp(\phi_1^\top x_i)\Bigr\}.}
#'
#' The final propensity score estimator for the population mean of \(y\) is given by
#' \deqn{\hat{\theta}_{PS2}=\frac{1}{N}\sum_{i=1}^{N}\delta_i\left\{1+\frac{\hat{N}_0}{N_1}\exp(\phi_0+\phi_1^\top x_i)\right\}y_i.}
#'
#' @param X Matrix of covariates.
#' @param delta Binary indicator vector; 1 indicates membership in the non-probability sample.
#' @param d Numeric vector of design weights for the probability sample.
#' @param y Numeric vector of outcome values.
#' @param start_phi1 Numeric vector of starting values for \(\phi_1\) (default is a zero vector).
#' @param maxit Maximum number of iterations for the solver (default 100).
#' @param tol Convergence tolerance for the solver (default 1e-6).
#' @param nleqslv_method Character; method for \code{nleqslv} optimization (default "Broyden").
#' @param nleqslv_global Character; global strategy for \code{nleqslv} (default "dbldog").
#' @param nleqslv_xscalm Character; scaling method for \code{nleqslv} (default "auto").
#'
#' @return A list with components:
#' \itemize{
#'   \item \code{phi}: A list with \code{phi0} and \code{phi1} (the estimated parameters).
#'   \item \code{theta_PS2}: The final propensity score estimator for the population mean of \(y\).
#'   \item \code{diagnostics}: The output from the \code{nleqslv} solver.
#' }
#'
#' @importFrom nleqslv nleqslv
#'
#' @noRd
est_method_maxent <- function(X, delta, d, y,
                              start_phi1 = rep(0, ncol(X)),
                              maxit = 100,
                              tol = 1e-6,
                              nleqslv_method = "Broyden",
                              nleqslv_global = "dbldog",
                              nleqslv_xscalm = "auto") {
  
  # Convert X to a matrix and determine the number of covariates
  X <- as.matrix(X)
  p <- ncol(X)
  
  # Compute Horvitz-Thompson totals
  N_total <- sum(d)         
  N1 <- sum(delta)          
  N0_hat <- sum(d) - N1     
  
  # Compute the target covariate mean for the probability sample
  x_bar0 <- (colSums(d * X) - colSums(delta * X)) / N0_hat
  
  # Define the moment function
  moment_fun <- function(phi1) {
    eta <- as.vector(X %*% phi1)
    phi0 <- log(N1) - log(sum(delta * exp(eta)))
    # Compute moment vector over covariates
    moments <- colSums(delta * exp(phi0 + eta) * (X - matrix(rep(x_bar0, each = nrow(X)), nrow = nrow(X))))
    moments
  }
  
  # Solve for theta_1 using nleqslv
  sol <- nleqslv::nleqslv(x = start_phi1, fn = moment_fun,
                          method = nleqslv_method,
                          global = nleqslv_global,
                          xscalm = nleqslv_xscalm,
                          control = list(maxit = maxit, ftol = tol))

  phi1_hat <- sol$x
  eta_hat <- as.vector(X %*% phi1_hat)
  phi0_hat <- log(N1) - log(sum(delta * exp(eta_hat)))
  
  # Compute the final propensity weight component
  ps_weight <- 1 + (N0_hat / N1) * exp(phi0_hat + eta_hat)
  
  # Compute the final propensity score estimator for the population mean of y
  theta_PS2 <- sum(delta * ps_weight * y) / N_total
  
  # Result
  result <- list(
    estimator = "ipw",                
    estimator_method = "maxent",      
    phi = list(phi0 = phi0_hat, phi1 = phi1_hat),
    theta_PS2 = theta_PS2,            
    diagnostics = sol,                
    output = data.frame(mean = theta_PS2, SE = NA), 
    confidence_interval = data.frame(lower_bound = NA, upper_bound = NA),
    svydesign = NULL,                 
    control = list(control_inference = list(vars_selection = "false", var_method = "analytic")),
    pop_size_fixed = FALSE,           
    y = list(y),                      
    call = match.call()               
  )
  
  # We set the class so that print.nonprob and print.nonprob_method will be used
  class(result) <- "nonprob"
  
  return(result)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant