Calculate optimism and bias-corrected scores via bootstrap resampling
Source:R/boot_optimism.R
boot_optimism.Rd
Estimate bias-corrected scores via calculation of bootstrap optimism (standard or .632).
Can also produce estimates for assessing the stability of prediction model predictions.
This function is called by validate
.
Usage
boot_optimism(
data,
outcome,
model_fun,
pred_fun,
score_fun,
method = c("boot", ".632"),
B = 200,
...
)
Arguments
- data
the data used in developing the model. Should contain all variables considered (i.e., even those excluded by variable selection in the development sample)
- outcome
character denoting the column name of the outcome in
data
.- model_fun
a function that takes at least one argument,
data
. This function should implement the entire model development procedure (i.e., hyperparameter tuning, variable selection, imputation). Additional arguments can be provided via...
. This function should return an object that works withpred_fun
.- pred_fun
function that takes at least two arguments,
model
anddata
. This function should return a numeric vector of predicted probabilities of the outcome with the same length as the number of rows indata
so it is important to take into account how missing data is treated (e.g.,predict.glm
omits predictions for rows with missing values).- score_fun
a function to calculate the metrics of interest. If this is not specified
score_binary
is used.- method
"boot" or ".632". The former estimates bootstrap optimism for each score and subtracts from apparent scores (simple bootstrap estimates are also produced as a by product). The latter estimates ".632" optimism as described in Harrell (2015). See
validate
details.- B
number of bootstrap resamples to run (should be at least 200)
- ...
additional arguments for
model_fun
,pred_fun
, and/orscore_fun
.
Value
a list of class internal_boot
containing:
apparent
- scores calculated on the original data using the original model.optimism
- estimates of optimism for each score (average difference in score for bootstrap models evaluated on bootstrap vs original sample) which can be subtracted from 'apparent' performance calculated using the original model on the original data.corrected
- 'bias corrected' scores (apparent - optimism)simple
- if method = "boot", estimates of scores derived from the 'simple bootstrap'. This is the average of each score calculated from the bootstrap models evaluated on the original outcome data. NULL if method = ".632"stability
- if method = "boot", a N,B matrix where N is the number of observations indata
andB
is the number of bootstrap samples. Each column contains the predicted probabilities of the outcome from each bootstrap model evaluated on the original data. NULL if method = ".632"
References
Steyerberg, E. W., Harrell Jr, F. E., Borsboom, G. J., Eijkemans, M. J. C., Vergouwe, Y., & Habbema, J. D. F. (2001). Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. Journal of clinical epidemiology, 54(8), 774-781.
Harrell Jr F. E. (2015). Regression Modeling Strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. New York: Springer Science, LLC.
Examples
library(pminternal)
set.seed(456)
# simulate data with two predictors that interact
dat <- pmcalibration::sim_dat(N = 1000, a1 = -2, a3 = -.3)
mean(dat$y)
#> [1] 0.186
dat$LP <- NULL # remove linear predictor
# fit a (misspecified) logistic regression model
#m1 <- glm(y ~ x1 + x2, data=dat, family="binomial")
model_fun <- function(data, ...){
glm(y ~ x1 + x2, data=data, family="binomial")
}
pred_fun <- function(model, data, ...){
predict(model, newdata=data, type="response")
}
boot_optimism(data=dat, outcome="y", model_fun=model_fun, pred_fun=pred_fun,
method="boot", B=20) # B set to 20 for example but should be >= 200
#> C Brier Intercept Slope Eavg E50 E90 Emax ECI
#> Apparent 0.7964 0.1262 6.6e-15 1.000 0.0195 0.0162 0.0328 0.101 0.062
#> Optimism 0.0049 -0.0018 1.0e-02 0.016 0.0044 0.0051 0.0076 0.020 0.048
#> Corrected 0.7915 0.1280 -1.0e-02 0.984 0.0151 0.0111 0.0251 0.081 0.014