This document provides a brief tutorial on using the
twangContinous
package to estimate causal effects for
continuous exposure variables using generalized propensity scores (GPSs)
estimated via generalized boosted models (GBMs).
Investigators often find themselves interested in examining the causal effect of a continuous exposure on an outcome. For example, it may be of interest to estimate the causal effect of trauma symptoms on substance use frequency among individuals with substance use disorder (SUD) or it might be of interest to understand the impact of income on measures of well-being. This package was designed to help investigators address such questions using state-of-the-art causal inference methods.
To do so, we first introduce some notation. Let A represent the continuous exposure or treatment variable, Y the outcome, and X the pretreatment covariates. Define Yi(a) as the potential outcome for individual i if they receive treatment level A = a. Further assume that Yi(a) is well-defined for a ∈ 𝒜 where 𝒜 is a continuous domain. Our goal is to estimate E[Y(a)], the mean effect of exposure A = a on the outcome.
In this setting, we define the generalized propensity score (GPS) as the conditional density function for A given X (i.e., fA|X(a|x)). As with all propensity score analyses, several assumptions are needed before proceeding. First, the ignorability assumption requires that f(a|Y(a), x) = f(a|x) where f(a|⋅) is the conditional probability density of A. Thus, instead of conditioning on a potentially high-dimensional X, it is sufficient to condition on the GPS (Imai & van Dyk, 2004). There is also a positivity assumption, such that all individuals have a non-zero probability of having received all possible values of the continuous exposure. Finally, the stable unit treatment value assumption (SUTVA) for continuous exposure settings (Hirano & Imbens, 2004; Imai & van Dyk, 2004) implies that there is no interference between units. Notably, propensity scores for continuous exposures are a fairly straightforward extension of binary exposures although they have not been nearly as popular in applications.
Once estimated, the stabilized inverse of the GPS may be used to weight the outcome model where for the ith subject, the weight is given as $$ w_i = \frac{f_A(A_i) }{f_{A|X}(A_i|X_i)} $$ for i = 1, …, N individuals (Robins & Hernan, 2000). As in the case for binary exposures, the GPS can be estimated with a parametric model. For example, Robins & Hernan (2000) used a linear regression model to estimate GPSs. In this case, with a sample of N observations, one regresses A on X using normal linear regression and then obtains predicted values for A, namely $\widehat{A_i}$ (representing the mean relationship between A and the given pretreatment covariates included in X), and σ̂ (the residual variance from the regression model). Then, one would plug these estimates into the normal density to obtain the needed GPS weights, $$ \frac{1}{\sqrt{2 \pi} \widehat{\sigma}} exp \left\{\frac{(A_i - \widehat{A_i})^2}{2\widehat{\sigma}^2} \right\}. $$
However, as with binary exposures, a non-parametric model, such as
GBM, is more flexible and potentially advantageous in that
non-linearities and interactions among potential confounders up to a
level prespecified by the user (e.g., all 3-way interactions) may be
automatically included (McCaffrey, Ridgeway, & Morral, 2004).
Additionally, GBM allows highly correlated confounders (i.e.,
predictors) and reduces bias and variance of causal effect estimates in
comparison to parametric models (Lee, Lessler, & Stuart, 2010). Zhu,
Coffman, & Ghosh (2015) showed that GBMs perform well for estimating
the GPS for a continuous exposure just as they do for binary exposures.
Their work directly used the gbm
R package but, with this
package, we have now integrated the function within the
twang
R package to take advantage of the useful diagnostic
features of the original twang
package (e.g., balance
assessment plots and tables).
Prior to use, GBM requires specification of a stopping criterion. For continuous exposures, we use the average or maximum absolute Pearson correlation between the exposure and each confounder to summarize balance performance at each iteration of the GBM fit, with the optimal iteration chosen as the one that minimizes the average or maximum absolute Pearson correlation. Ideally, we hope the optimal iteration is such that the Pearson correlations are all 0 given that the goal when using GPS weights is that the exposure should be unrelated to the covariates in the weighted sample. Thus, balance can be assessed by computing Pearson correlations between each of the potential confounders and the exposure. Zhu et al. (2015) showed that 0.1 is a reasonable cutoff for this metric.
If you have not already done so, install from CRAN by typing .
Alternatively, you can install development versions directly from GitHub
by first installing and loading the devtools
package and
using the following code.
twangContinuous
relies on other R packages, especially
gbm
. You may have to run install.packages()
for these as well if they are not already installed. You will only need
to do this step once. In the future running
update.packages()
regularly will ensure that you have the
latest versions of the packages, including bug fixes and new
features.
To start using twangContinuous
, first load the package.
You will have to do this step once for each R session that you run. We
also set the seed of R’s pseudo random number generator so that the
results are exactly replicable.
We will use the synthetic data set included with the package and call
it dat
to illustrate the use of the package. In this data
set, tss_0
is the continuous exposure and represents a
count of trauma symptoms and sfs8p_3
is the outcome
variable and measures substance use frequency at 3-month follow-up. The
following baseline covariates are included in the propensity score
model: substance use frequency scale, sfs8p_0
, treatment
group, treat
, whether they are in recovery,
recov_0
, whether they primarily use opioids
vs. alcohol/marijuana vs. other drugs, subsgrps_n
, the
substance problem scale (past month), sp_sm_0
, and
substance abuse treatment index, sati_0
. These covariates
are potential confounders of tss_0
and sfs8p_3
and were generated to mimic the confounders in the real dataset. Load
the data using the following code:
R can read data from many other sources. The manual R Data Import/Export describes those options in detail.
The function ps.cont()
is the main function in
twangContinuous
. Enter the formula using the standard R
formula notation (e.g., tss_0 ~ sfs8p_0 + sati_0 + treat
)
where the continuous exposure variable is on the left hand side of
'~'
and the confounders are on the right hand side. Next,
use the data
argument to specify the name of the data
set.
The function includes several optional arguments:
n.trees
, interaction.depth
, and
shrinkage
are parameters for the gbm
model.
n.trees
- the maximum number of iterations that
gbm
will run. ps.cont()
will issue a warning
if the estimated optimal number of iterations is too close to the bound
selected in this argument because it indicates that balance may improve
if more complex models (i.e., those with more trees) are considered. The
user should increase n.trees
or increase
shrinkage
(see below) if this warning appears. The default
number of trees is set to 10000.
interaction.depth
- controls the level of interactions
allowed in the GBM. The default is 3 (i.e., up to three-way interactions
among the covariates are allowed) and we typically use the default in
our analyses.
shrinkage
- controls the amount of shrinkage. Small
values such as 0.005 or 0.001 yield smooth fits but require greater
values of n.trees
to achieve adequate fits. Computational
time increases inversely with shrinkage
argument. The
default value is set to 0.01.
Two other optional arguments to ps.cont()
are
verbose
and sampw
.
sampw
- optional argument and by default is set to NULL.
If there are survey weights, they can be included using the
sampw
argument.
verbose
- controls the amount of information printed to
the console. By default it is set to FALSE.
In the example below, because the values of verbose
,
shrinkage
, and interaction.depth
are set at
the default values, it would not be necessary to include these arguments
but for the sake of illustration, they are included. We set
n.trees
to 500 for the purposes of reducing computational
time in compiling this tutorial. (We previously ran the model and know
that 500 trees are enough.)
test.mod <- ps.cont(tss_0 ~ sfs8p_0 + sati_0 + sp_sm_0
+ recov_0 + subsgrps_n + treat,
data=dat,
n.trees = 500,
shrinkage = 0.01,
interaction.depth = 3,
verbose = FALSE)
For those familiar with the ps()
function in
twang
, unlike the ps()
function,
ps.cont()
does not have arguments for estimand
or perm.test.iters
.
The only stop method currently available is the average absolute
weighted correlation, and thus, it is set as the default. At this point
in development, only the average absolute weighted correlation has been
thoroughly tested as a stopping criteria (see Zhu et al., 2015). The
stop method specifies a set (or sets) of rules and measures for
assessing the balance on the pretreatment covariates. The
ps.cont()
function selects the optimal number of GBM
iterations to minimize the correlation between the continuous exposure
variable and the pretreatment covariates. To summarize the correlations
across all of the covariates, we first take the absolute value of the
correlations and then take the average across all of the covariates.
Future developments will allow the user to specify the maximum absolute
weighted correlation or the root mean squared weighted correlation using
the argument stop.method
(but at present this cannot be
changed).
Having fit the propensity score model, the analyst should perform
various diagnostic checks prior to estimating the causal effect. The
first of these diagnostic checks makes sure that the specified value of
n.trees
allowed GBM to explore sufficiently complicated
models. We can do this using the plot()
function with the
argument plots="optimize"
.
The plot gives the balance measures as a function of the number of
iterations in the GBM algorithm, with higher iterations corresponding to
more complicated fitted models. If it appears that additional iterations
would be likely to result in lower values of the balance statistic,
n.trees
should be increased. However, after a point,
additional complexity typically makes the balance worse. The average
absolute correlation was minimized at iteration 347 (the exact iteration
is known from using the summary()
function that will be
discussed shortly).
Once the GPSs are estimated, the twangContinuous
package
automatically generates the inverse probability (or propensity) weights.
In the original unweighted data, the continuous exposure is correlated
to some degree with each of the confounders but in the weighted data,
these correlations should be close to zero, indicating that the
continuous exposure and the confounders are unrelated and therefore the
relationships between the continuous exposure and the confounders have
been broken. If this relationship is broken, then the confounders are no
longer confounders (recall the definition of a confounder is a variable
that affects, or is related to, both the exposure and the outcome).
Having saved the ps.cont
object, the analyst can use the
summary()
function to obtain information about the
effective sample size (ESS) of the weighted data (the ess
column, see details below) and the maximum absolute correlation
(max.wcor
column), mean or average absolute correlation
(mean.wcor
column), and the root mean square of the
absolute correlations (rms.wcor
column) in the unweighted
and weighted data (unw
and AAC
rows,
respectively). Finally, the iter
column specifies the
iteration at which the average absolute correlation
(mean.wcor
) was minimized as described above in the
discussion of stop methods.
## n ess max.wcor mean.wcor rms.wcor iter
## unw 4000 4000.000 0.21633979 0.1034782 0.11887909 NA
## AAC 4000 3559.661 0.04680874 0.0192826 0.02492979 347
In general, weighted means can have greater sampling variance than unweighted means from a sample of equal size. The ESS of the weighted data captures this increase in variance as
The ESS is approximately the number of observations from a simple random sample that yields an estimate with sampling variation equal to the sampling variation obtained with the weighted observations. The ESS is an accurate measure of the relative size of the variance of means when the weights are fixed or they are uncorrelated with outcomes. Otherwise the ESS underestimates the effective sample size (Little & Vartivarian, 2004). With propensity score weights, it is rare that weights are uncorrelated with outcomes. Hence the ESS typically gives a lower bound, but it still serves as a useful measure for choosing among alternative models and assessing the overall quality of a model, even if it provides a possibly conservative picture of the loss in precision due to weighting.
The ess
column in the summary table shows that although
the original (i.e., unweighted) data had 4000 cases, the propensity
score weighting effectively uses only 3560 cases. While this may seem
like a large loss of sample size, this is a byproduct of the degree of
confounding in the original data.
The function bal.table
produces a table that shows how
well the weights reduced the correlations between the exposure and each
covariate.
## unw wcor
## sfs8p_0 0.115 -0.006
## sati_0 0.139 -0.011
## sp_sm_0 0.216 0.001
## recov_0 -0.112 -0.039
## subsgrps_n 0.035 -0.047
## treatA 0.053 0.016
## treatB -0.053 -0.016
The unw
column gives the unweighted correlation and the
wcor
column gives the weighted correlation between the
exposure and each covariate. Thus, the table shows that the correlation
between sati_0
and tss_0
in the unweighted
data was 0.139. After applying the propensity score weights, this
correlation has been reduced to -0.011. Ideally, all of the correlations
in the weighted data should be below 0.1 (i.e., a small correlation
according to Cohen’s (1988) convention [see also Zhu et al., 2015]) in
absolute value and the closer to zero, the better. Thus, the above table
indicates that we have achieved good balance on the covariates.
To visualize how the correlations have decreased (in absolute value)
in the weighted sample, use the plot()
function with the
argument plots="es"
. Each line represents a covariate. A
horizontal line 0.1 has been included for reference as we would like to
see all the weighted correlations below this line.
A separate R package, the survey
package, is useful for
performing the outcome analysis using weights and properly accounting
for the weights when computing standard error estimates. It is not a
part of the standard R installation so you may need to install it from
CRAN. Once installed, it can be loaded using:
The get.weights()
function from the
twangContinuous
package extracts the propensity score
weights from a ps.cont
object. Those weights may then be
saved into the dataset and used as case weights in a
svydesign
object.
The svydesign
function from the survey
package creates an object that stores the dataset along with design
information needed for analyses. See help(svydesign)
for
more details on setting up svydesign
objects.
We are finally ready to estimate the causal effect of
tss_0
on sfs8p_3
.
outcome.model <- svyglm(sfs8p_3 ~ tss_0, design = design.ps,
family = gaussian())
summary(outcome.model)
##
## Call:
## svyglm(formula = sfs8p_3 ~ tss_0, design = design.ps, family = gaussian())
##
## Survey design:
## svydesign(ids = ~1, weights = ~wts, data = dat)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.595461 0.232069 28.420 <2e-16 ***
## tss_0 0.002616 0.062416 0.042 0.967
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 111.6769)
##
## Number of Fisher Scoring iterations: 2
The results indicate that for each additional trauma symptom, substance use frequency increases by 0.003, which is not statistically significant, t = 0.042, p = .967.
This tutorial and the twangContinuous
package were
supported by funding from grant 1R01DA034065 from the National Institute
on Drug Abuse. The overarching goal of the grant is to develop
statistical methods and tools that will provide addiction health
services researchers and others with the tools and training they need to
study the effectiveness of treatments using observational data. The work
is an extension of the Toolkit for Weighting and Analysis of
Nonequivalent Groups, or TWANG, which contains a set of functions to
support causal modeling of observational data through the estimation and
evaluation of propensity score weights. The TWANG package was first
developed in 2004 by RAND researchers for the R statistical computing
language and environment and has since been expanded to include tools
for SAS, Stata, and Shiny. For more information about TWANG and other
causal tools being developed, see https://www.rand.org/statistics/twang.
RAND Social and Economic Well-Being is a division of the RAND Corporation that seeks to actively improve the health and social and economic well-being of populations and communities throughout the world. This research was conducted in the Social and Behavioral Policy Program within RAND Social and Economic Well-Being. The program focuses on such topics as risk factors and prevention programs, social safety net programs and other social supports, poverty, aging, disability, child and youth health and well-being, and quality of life, as well as other policy concerns that are influenced by social and behavioral actions and systems that affect well-being. For more information, email [email protected].
We would like to thank Noah Greifer for code QA and Yajnaseni Chakraborti, Megan Schuler, Mary Ellen Slaughter, and Maria DeYoreo for feedback on this tutorial and beta testing the package.
Hirano, K., & Imbens, G. W. (2004). The propensity score with continuous treatments. Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives. Hoboken, NJ: John Wiley & Sons.
Imai, K., & van Dyk, D. A. (2004). Causal inference with general treatment regimes: Generalizing the propensity score. Journal of the American Statistical Association, 99, 854-866.
Lee, B. K., Lessler, J., & Stuart, E. A. (2010). Improving propensity score weighting using machine learning. Statistics in Medicine, 29(3), 337-346.
Little, R. J., & Vartivarian, S. (2004). Does weighting for nonresponse increase the variance of survey means? ASA Proceedings of the Joint Statistical Meetings, 3897-3904 American Statistical Association (Alexandria, VA)
McCaffrey, D. F., Ridgeway, G., & Morral, A. R. (2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychological Methods, 9(4), 403-425.
Robins, J. M., Hernan, M. A., & Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology, 11(5), 550-560.
Zhu, Y., Coffman, D. L., & Ghosh, D. (2015). A boosting algorithm for estimating generalized propensity scores with continuous treatments. Journal of Causal Inference, 3(1), 25-40.