Get bias-corrected performance measures via bootstrapping or cross-validation
Source:R/validate.R
validate.Rd
Performs internal validation of a prediction model development procedure via bootstrapping
or cross-validation. Many model types are supported via the insight
and marginaleffects
packages or users can supply user-defined functions that implement the model development
procedure and retrieve predictions. Bias-corrected scores and estimates of optimism (where applicable)
are provided.
Usage
validate(
fit,
method = c("boot_optimism", "boot_simple", ".632", "cv_optimism", "cv_average"),
data,
outcome,
model_fun,
pred_fun,
score_fun,
B,
...
)
Arguments
- fit
a model object. If fit is given the
insight
package is used to extract data, outcome, and original model call. Therefore, it is important that fit be supported byinsight
and implements the entire model development process (see Harrell 2015). A fit given after selection of variables by some method will not give accurate bias-correction. Model predictions are obtained viamarginaleffects::get_predict
with type = "response" so fit should be compatible with this function. If fit is provided the arguments data, outcome, model_fun, and pred_fun are all ignored.- method
bias-correction method. Valid options are "boot_optimism", "boot_simple", ".632", "cv_optimism", or "cv_average". See details.
- data
a data.frame containing data used to fit development model
- outcome
character denoting the column name of the outcome in data
- model_fun
for models that cannot be supplied via fit this should be a function that takes one named argument: 'data' (function should include ... among arguments). This function should implement the entire model development procedure (hyperparameter tuning, variable selection, imputation etc) and return an object that can be used by pred_fun. Additional arguments can be supplied by ...
- pred_fun
for models that cannot be supplied via fit this should be a function that takes two named arguments: 'model' and 'data' (function should include ... among arguments). 'model' is an object returned by model_fun. The function should return a vector of predicted risk probabilities of the same length as the number of rows in data. Additional arguments can be supplied by ...
- score_fun
function used to produce performance measures from predicted risks and observed binary outcome. Should take two named arguments: 'y' and 'p' (function should include ... among arguments). This function should return a named vector of scores. If unspecified
score_binary
is used and this should be good for most purposes.- B
number of bootstrap replicates or crossvalidation folds. If unspecified B is set to 200 for method = "boot_\*"/".632", or is set to 10 for method = "cv_\*".
- ...
additional arguments for user-defined functions. Arguments for producing calibration curves can be set via 'calib_args' which should be a named list (see
cal_plot
andscore_binary
). For method = "boot_optimism", "boot_simple", or ".632" users can specify acores
argument (e.g.,cores = 4
) to run bootstrap samples in parallel.
Value
an object of class internal_validate containing apparent and bias-corrected estimates of performance scores. If method = "boot_*" it also contains results pertaining to stability of predictions across bootstrapped models (see Riley and Collins, 2023).
Details
Internal validation can provide bias-corrected estimates of performance (e.g., C-statistic/AUC) for a model development procedure (i.e., expected performance if the same procedure were applied to another sample of the same size from the same population; see references). There are several approaches to producing bias-corrected estimates (see below). It is important that the fit or model_fun provided implement the entire model development procedure, including any hyperparameter tuning and/or variable selection.
Note that validate
does very little to check for missing values. If fit
is
supplied insight::get_data
will extract the data used to fit the model and usually
this will result in complete cases being used. User-defined model and predict functions can
be specified to handle missing values among predictor variables. Currently any user supplied data will
have rows with missing outcome values removed.
method
- boot_optimism
(default) estimates optimism for each score and subtracts from apparent score (score calculated with the original/development model evaluated on the original sample). A new model is fit using the same procedure using each bootstrap resample. Scores are calculated when applying the boot model to the boot sample (\(S_{boot}\)) and the original sample (\(S_{orig}\)) and the difference gives an estimate of optimism for a given resample (\(S_{boot} - S_{orig}\)). The average optimism across the B resamples is subtracted from the apparent score to produce the bias corrected score.
- boot_simple
implements the simple bootstrap. B bootstrap models are fit and evaluated on the original data. The average score across the B replicates is the bias-corrected score.
- .632
implements Harrell's adaption of Efron's .632 estimator for binary outcomes (see rms::predab.resample and rms::validate). In this case the estimate of optimism is \(0.632 \times (S_{app} - mean(S_{omit} \times w))\) where \(S_{app}\) is the apparent performance score and \(S_{omit}\) is the score estimated using the bootstrap model evaluated on the out-of-sample observations and \(w\) weights for the proportion of observations omitted (see Harrell 2015, p. 115).
- cv_optimism
estimate optimism via B-fold crossvalidation. Optimism is the average of the difference in performance measure between predictions made on the training vs test (held out fold) data. This is the approach implemented in
rms::validate
with method="crossvalidation".- cv_average
bias corrected scores are the average of scores calculated by assessing the model developed on each fold evaluated on the test/held out data. This approach is described and compared to "boot_optimism" and ".632" in Steyerberg et al. (2001).
References
Steyerberg, E. W., Harrell Jr, F. E., Borsboom, G. J., Eijkemans, M. J. C., Vergouwe, Y., & Habbema, J. D. F. (2001). Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. Journal of clinical epidemiology, 54(8), 774-781.
Harrell Jr F. E. (2015). Regression Modeling Strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. New York: Springer Science, LLC.
Efron (1983). “Estimating the error rate of a prediction rule: improvement on cross-validation”. Journal of the American Statistical Association, 78(382):316-331
Van Calster, B., Steyerberg, E. W., Wynants, L., and van Smeden, M. (2023). There is no such thing as a validated prediction model. BMC medicine, 21(1), 70.
Riley RD, Collins GS. (2023). Stability of clinical prediction models developed using statistical or machine learning methods. Biom J. doi:10.1002/bimj.202200302. Epub ahead of print.
Examples
library(pminternal)
set.seed(456)
# simulate data with two predictors that interact
dat <- pmcalibration::sim_dat(N = 2000, a1 = -2, a3 = -.3)
mean(dat$y)
#> [1] 0.1985
dat$LP <- NULL # remove linear predictor
# fit a (misspecified) logistic regression model
m1 <- glm(y ~ ., data=dat, family="binomial")
# internal validation of m1 via bootstrap optimism with 10 resamples
# B = 10 for example but should be >= 200 in practice
m1_iv <- validate(m1, method="boot_optimism", B=10)
#> It is recommended that B >= 200 for bootstrap validation
m1_iv
#> C Brier Intercept Slope Eavg E50 E90 Emax ECI
#> Apparent 0.7779 0.1335 0.000 1.00000 0.0076 0.0064 0.0115 0.058 0.011
#> Optimism 0.0016 -0.0011 -0.019 0.00083 0.0052 0.0038 0.0088 0.078 0.037
#> Corrected 0.7764 0.1346 0.019 0.99917 0.0024 0.0026 0.0027 -0.020 -0.026