Get bias-corrected performance measures via bootstrapping or cross-validation

Performs internal validation of a prediction model development procedure via bootstrapping or cross-validation. Many model types are supported via the insight and marginaleffects packages or users can supply user-defined functions that implement the model development procedure and retrieve predictions. Bias-corrected scores and estimates of optimism (where applicable) are provided. See confint.internal_validate for calculation of confidence intervals.

Usage

validate(
  fit,
  method = c("boot_optimism", "boot_simple", ".632", "cv_optimism", "cv_average", "none"),
  data,
  outcome,
  model_fun,
  pred_fun,
  score_fun,
  B,
  ...
)

Arguments

fit: a model object. If fit is given the insight package is used to extract data, outcome, and original model call. Therefore, it is important that fit be supported by insight and implements the entire model development process (see Harrell 2015). A fit given after selection of variables by some method will not give accurate bias-correction. Model predictions are obtained via marginaleffects::get_predict with type = "response" so fit should be compatible with this function. If fit is provided the arguments data, outcome, model_fun, and pred_fun are all ignored.
method: bias-correction method. Valid options are "boot_optimism", "boot_simple", ".632", "cv_optimism", "cv_average", or "none" (return apparent performance). See details.
data: a data.frame containing data used to fit development model
outcome: character denoting the column name of the outcome in data
model_fun: for models that cannot be supplied via fit this should be a function that takes one named argument: 'data' (function should include ... among arguments). This function should implement the entire model development procedure (hyperparameter tuning, variable selection, imputation etc) and return an object that can be used by pred_fun. Additional arguments can be supplied by ...
pred_fun: for models that cannot be supplied via fit this should be a function that takes two named arguments: 'model' and 'data' (function should include ... among arguments). 'model' is an object returned by model_fun. The function should return a vector of predicted risk probabilities of the same length as the number of rows in data. Additional arguments can be supplied by ...
score_fun: function used to produce performance measures from predicted risks and observed binary outcome. Should take two named arguments: 'y' and 'p' (function should include ... among arguments). This function should return a named vector of scores. If unspecified score_binary is used and this should be good for most purposes.
B: number of bootstrap replicates or crossvalidation folds. If unspecified B is set to 200 for method = "boot_*"/".632", or is set to 10 for method = "cv_*".
...: additional arguments for user-defined functions. Arguments for producing calibration curves can be set via 'calib_args' which should be a named list (see cal_plot and score_binary). For method = "boot_optimism", "boot_simple", or ".632" users can specify a cores argument (e.g., cores = 4) to run bootstrap samples in parallel.

Value

an object of class internal_validate containing apparent and bias-corrected estimates of performance scores. If method = "boot_*" it also contains results pertaining to stability of predictions across bootstrapped models (see Riley and Collins, 2023).

Details

Internal validation can provide bias-corrected estimates of performance (e.g., C-statistic/AUC, calibration intercept/slope) for a model development procedure (i.e., expected performance if the same procedure were applied to another sample of the same size from the same population; see references). There are several approaches to producing bias-corrected estimates (see below). It is important that the fit or model_fun provided implement the entire model development procedure, including any hyperparameter tuning and/or variable selection.

Note that validate does very little to check for missing values in predictors/features. If fit is supplied insight::get_data will extract the data used to fit the model and usually this will result in complete cases being used. User-defined model and predict functions can be specified to handle missing values among predictor variables. Currently any user supplied data will have rows with missing outcome values removed.

method Different options for the method argument are described below:

boot_optimism: (default) estimates optimism for each score and subtracts from apparent score (score calculated with the original/development model evaluated on the original sample). A new model is fit using the same procedure using each bootstrap resample. Scores are calculated when applying the boot model to the boot sample (\(S_{boot}\)) and the original sample (\(S_{orig}\)) and the difference gives an estimate of optimism for a given resample (\(S_{boot} - S_{orig}\)). The average optimism across the B resamples is subtracted from the apparent score to produce the bias corrected score.
boot_simple: implements the simple bootstrap. B bootstrap models are fit and evaluated on the original data. The average score across the B replicates is the bias-corrected score.
.632: implements Harrell's adaption of Efron's .632 estimator for binary outcomes (see rms::predab.resample and rms::validate). In this case the estimate of optimism is \(0.632 \times (S_{app} - mean(S_{omit} \times w))\) where \(S_{app}\) is the apparent performance score and \(S_{omit}\) is the score estimated using the bootstrap model evaluated on the out-of-sample observations and \(w\) weights for the proportion of observations omitted (see Harrell 2015, p. 115).
cv_optimism: estimate optimism via B-fold crossvalidation. Optimism is the average of the difference in performance measure between predictions made on the training vs test (held out fold) data. This is the approach implemented in rms::validate with method="crossvalidation".
cv_average: bias corrected scores are the average of scores calculated by assessing the model developed on each fold evaluated on the test/held out data. This approach is described and compared to "boot_optimism" and ".632" in Steyerberg et al. (2001).

Calibration curves To make calibration curves and calculate the associated estimates (ICI, ECI, etc - see score_binary) validate uses the default arguments in cal_defaults. These arguments are passed to the pmcalibration package (see ?pmcalibration::pmcalibration for options).

If a calibration plot (apparent vs bias corrected calibration curves via cal_plot) is desired, the argument 'eval' should be provided. This should be the points at which to evaluate the calibration curve on each boot resample or crossvalidation fold. A good option would be calib_args = list(eval = seq(min(p), max(p), length.out=100)); where p are predictions from the original model evaluated on the original data.

Number of resamples/folds is less than requested If the model_fun produces an error or if score_binary is supplied with constant predictions or outcomes (e.g. all(y == 0)) the returned scores will all be NA. These will be omitted from the calculation of optimism or other bias-corrected estimates (cv_average, boot_simple) and the number of successful resamples/folds will be < B. validate collects error messages and will produce a warning summarizing them. The number of successful samples is given in the 'n' column in the printed summary of an 'internal_validate' object.

It is important to understand what is causing the loss of resamples/folds. Some potential sources (which will need to be added to) are that for rare events the resamples/folds may be resulting in samples that have zero outcomes. For 'cv_*' this will especially be the case if B (n folds) is set high. There may be problems with factor/binary predictor variables with rare levels, which could be dealt with by specifying a model_fun that omits variables for the model formula if only one level is present. The issue may be related to the construction of calibration curves and may be addressed by more carefully selecting settings (see section above).

References

Steyerberg, E. W., Harrell Jr, F. E., Borsboom, G. J., Eijkemans, M. J. C., Vergouwe, Y., & Habbema, J. D. F. (2001). Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. Journal of clinical epidemiology, 54(8), 774-781.

Harrell Jr F. E. (2015). Regression Modeling Strategies: with applications to linear models, logistic and ordinal regression, and survival analysis. New York: Springer Science, LLC.

Efron (1983). “Estimating the error rate of a prediction rule: improvement on cross-validation”. Journal of the American Statistical Association, 78(382):316-331

Van Calster, B., Steyerberg, E. W., Wynants, L., and van Smeden, M. (2023). There is no such thing as a validated prediction model. BMC medicine, 21(1), 70.

Riley, R. D., & Collins, G. S. (2023). Stability of clinical prediction models developed using statistical or machine learning methods. Biometrical Journal, 65(8), 2200302. doi:10.1002/bimj.202200302

Examples

library(pminternal)
set.seed(456)
# simulate data with two predictors that interact
dat <- pmcalibration::sim_dat(N = 2000, a1 = -2, a3 = -.3)
mean(dat$y)
#> [1] 0.1985
dat$LP <- NULL # remove linear predictor

# fit a (misspecified) logistic regression model
m1 <- glm(y ~ ., data=dat, family="binomial")

# internal validation of m1 via bootstrap optimism with 10 resamples
# B = 10 for example but should be >= 200 in practice
m1_iv <- validate(m1, method="boot_optimism", B=10)
#> It is recommended that B >= 200 for bootstrap validation
m1_iv
#>           apparent optimism corrected  n
#> C           0.7779  0.00158    0.7764 10
#> Brier       0.1335 -0.00111    0.1346 10
#> Intercept   0.0000 -0.01917    0.0192 10
#> Slope       1.0000  0.00083    0.9992 10
#> Eavg        0.0076  0.00516    0.0024 10
#> E50         0.0064  0.00381    0.0026 10
#> E90         0.0115  0.00882    0.0027 10
#> Emax        0.0580  0.07771   -0.0197 10
#> ECI         0.0110  0.03656   -0.0256 10

library(rms)
#> Loading required package: Hmisc
#> 
#> Attaching package: ‘Hmisc’
#> The following objects are masked from ‘package:base’:
#> 
#>     format.pval, units
#> 
#> Attaching package: ‘rms’
#> The following object is masked from ‘package:pminternal’:
#> 
#>     validate
m2 <- lrm(y ~ ., data=dat)
m2_iv <- pminternal::validate(m2, method="boot_optimism", B=10)
#> It is recommended that B >= 200 for bootstrap validation