Getting started with `pminternal`
Stephen Rhodes
2023-11-27
Source:vignettes/pminternal.Rmd
pminternal.Rmd
Introduction
In developing a clinical prediction model measures of model performance are biased by the fact that were using the same data to fit (‘train’) the model as evaluate it. Splitting a data into development and validation sets is inefficient. Bootstrapping or cross-validation can be used to estimate bias-corrected measures of model performance. This is known as ‘internal validation’ and addresses the question: what is the expected performance of a model developed in the same way in a sample selected from the same population? This is not to be confused with ‘external validation’ which assesses model performance in a different population.
pminternal
is inspired by the functions
validate
and predab.resample
from the
rms
package. The aim is to provide a package that will work
with any user-defined model development procedure (assuming it can be
implemented in an R function). The package also implements more recently
proposed ‘stability plots’. Currently only binary outcomes are supported
but a goal is to eventually extend to other outcomes (survival,
ordinal).
Supplying a model via fit
validate
only needs a single argument to run,
fit
. fit
should be a fitted model that is
compatible with insight::get_data
,
insight::find_response
, insight::get_call
, and
marginaleffects::get_predict
. Models supported by insight
can be found by running insight::supported_models()
(or run
is_model_supported(fit)
); models supported by
marginaleffects
are here https://marginaleffects.com/articles/supported_models.html.
As we’re dealing with binary outcomes, not all models listed will be
applicable.
The code below loads the GUSTO-I trial data, selects relevant
variables, downsamples to reduce run time, fits a development model (a
glm
), and passes it to validate
.
library(pminternal)
library(Hmisc)
#>
#> Attaching package: 'Hmisc'
#> The following objects are masked from 'package:base':
#>
#> format.pval, units
getHdata("gusto")
gusto <- gusto[, c("sex", "age", "hyp", "htn", "hrt", "pmi", "ste", "day30")]
gusto$y <- gusto$day30; gusto$day30 <- NULL
set.seed(234)
gusto <- gusto[sample(1:nrow(gusto), size = 5000),]
mod <- glm(y ~ ., data = gusto, family = "binomial")
mod_iv <- validate(mod, B = 20)
#> It is recommended that B >= 200 for bootstrap validation
mod_iv
#> C Brier Intercept Slope Eavg E50 E90 Emax
#> Apparent 0.79959 0.05992 0.000 1.0000 0.00385 0.0023 0.0076 0.093
#> Optimism 0.00099 -0.00046 0.014 0.0082 0.00092 0.0012 0.0032 0.035
#> Corrected 0.79860 0.06038 -0.014 0.9918 0.00293 0.0011 0.0044 0.057
#> ECI
#> Apparent 0.00521
#> Optimism 0.00578
#> Corrected -0.00057
As this validate
call was run with
method = "boot_optimism"
we are able to assess model
stability via the following calls. Note that these stability plots are
not based on the estimates of optimism but rather based on predictions
from models developed on bootstrapped resampled data sets evaluated on
the original/development data. In that sense it is conceptually more
related to the bias-corrected estimates obtained from
method = "boot_simple"
. In any case both methods results in
the necessary data to make these plots (see also
classification_stability
and
dcurve_stability
).
# prediction stability plot with 95% 'stability interval'
prediction_stability(mod_iv, bounds = .95)
# calibration stability
# (using default calibration curve arguments: see pminternal:::cal_defaults())
calibration_stability(mod_iv)
# mean absolute prediction error (mape) stability
# mape = average difference between boot model predictions
# for original data and original model
mape <- mape_stability(mod_iv)
mape$average_mape
#> [1] 0.007446025
As a final part to this example. It is possible to get apparent and bias-corrected calibration curves. For this we need to set an additional argument, specifying where to assess the calibration curve (i.e., points on the x-axis) as follows.
# find 100 equally spaced points
# between the lowest and highest risk prediction
p <- predict(mod, type="response")
p_range <- seq(min(p), max(p), length.out=100)
mod_iv2 <- validate(mod, B = 20, calib_args=list(eval=p_range))
#> It is recommended that B >= 200 for bootstrap validation
mod_iv2
#> C Brier Intercept Slope Eavg E50 E90 Emax ECI
#> Apparent 0.7996 0.05992 0.000 1.000 0.00385 0.00232 0.0076 0.093 0.00521
#> Optimism 0.0038 -0.00068 0.045 0.025 0.00047 0.00158 0.0027 0.030 0.00093
#> Corrected 0.7957 0.06060 -0.045 0.975 0.00338 0.00074 0.0049 0.062 0.00428
calp <- cal_plot(mod_iv2)
The plotting functions are fairly basic but all invisibly return the
data needed to reproduce them as you like. For example, the plot below
uses ggplot2
and adds a histogram of the predicted risk
probabilities (stored in p
) to show their distribution.
head(calp)
#> predicted apparent bias_corrected
#> 1 0.001639574 0.001092496 0.000854699
#> 2 0.009714890 0.007787001 0.007759467
#> 3 0.017790205 0.015366875 0.019001038
#> 4 0.025865521 0.023634527 0.027067193
#> 5 0.033940837 0.032389166 0.034308507
#> 6 0.042016153 0.041466700 0.043179498
library(ggplot2)
ggplot(calp, aes(x=predicted)) +
geom_abline(lty=2) +
geom_line(aes(y=apparent, color="Apparent")) +
geom_line(aes(y=bias_corrected, color="Bias-Corrected")) +
geom_histogram(data = data.frame(p = p), aes(x=p, y=after_stat(density)*.01),
binwidth = .001, inherit.aes = F, alpha=1/2) +
labs(x="Predicted Risk", y="Estimated Risk", color=NULL)
Additional models that could be supplied via fit
and
that I have tested on this gusto example are given below. Please let me
know if you run into trouble with a model class that you feel should
work with fit
. The chunk below is not evaluated for build
time so does not print any output.
### generalized boosted model with gbm
library(gbm)
# syntax y ~ . does not work with gbm
mod <- gbm(y ~ sex + age + hyp + htn + hrt + pmi + ste,
data = gusto, distribution = "bernoulli", interaction.depth = 2)
(gbm_iv <- validate(mod, B = 20))
### generalized additive model with mgcv
library(mgcv)
mod <- gam(y ~ sex + s(age) + hyp + htn + hrt + pmi + ste,
data = gusto, family = "binomial")
(gam_iv <- validate(mod, B = 20))
mod <- bam(y ~ sex + s(age, bs = "cr") + hyp + htn + hrt + pmi + ste,
data = gusto, family = "binomial")
(bam_iv <- validate(mod, B = 20))
### rms implementation of logistic regression
mod <- rms::lrm(y ~ ., data = gusto)
# not loading rms to avoid conflict with rms::validate...
(lrm_iv <- validate(mod, B = 20))
User-defined model development functions
It is important that what is being internally validated is the
entire model development procedure, including any tuning of
hyperparameters, variable selection, and so on. Often a fit
object will not capture this (or will not be supported).
In the example below we work with a model that is not supported by
insight
or marginaleffects
: logistic
regression with lasso (L1) regularization. The functions we need to
specify are model_fun
and pred_fun
.
-
model_fun
should take a single argument,data
, and return and object that can be used to make predictions withpred_fun
....
should also be added as an argument to allow for optional arguments passed tovalidate
(see vignette(“pminternal-examples”) for more examples of user-defined functions that take optional arguments).lasso_fun
formats data forglmnet
, then selects the hyperparameter,lambda
(controls the degree of regularization), via 10-fold cross-validation, and fits the final model with the ‘best’ value oflambda
and returns. -
pred_fun
should take two arguments,model
anddata
, as well as the optional argument(s)...
.pred_fun
should work with the model object returned bymodel_fun
.glmnet
objects have their ownpredict
method so the functionlasso_predict
simply formats the data and returns the predictions.predict.glmnet
returns a matrix so we select the first column to return a vector of predicted risks.
#library(glmnet)
lasso_fun <- function(data, ...){
y <- data$y
x <- data[, c('sex', 'age', 'hyp', 'htn', 'hrt', 'pmi', 'ste')]
x$sex <- as.numeric(x$sex == "male")
x$pmi <- as.numeric(x$pmi == "yes")
x <- as.matrix(x)
cv <- glmnet::cv.glmnet(x=x, y=y, alpha=1, nfolds = 10, family="binomial")
lambda <- cv$lambda.min
glmnet::glmnet(x=x, y=y, alpha = 1, lambda = lambda, family="binomial")
}
lasso_predict <- function(model, data, ...){
x <- data[, c('sex', 'age', 'hyp', 'htn', 'hrt', 'pmi', 'ste')]
x$sex <- as.numeric(x$sex == "male")
x$pmi <- as.numeric(x$pmi == "yes")
x <- as.matrix(x)
plogis(glmnet::predict.glmnet(model, newx = x)[,1])
}
We recommend that you use ::
to refer to functions from
particular packages if you want to run bootstrapping in parallel. For
cores = 1 (or no cores argument supplied) or cross-validation this will
not be an issue and you can use library
.
The code below tests these functions out on gusto
.
lasso_app <- lasso_fun(gusto)
lasso_p <- lasso_predict(model = lasso_app, data = gusto)
They work as intended so we can pass these functions to
validate
as follows. Here we are using cross-validation to
estimate optimism. Note that the 10-fold cross-validation to select the
best value of lambda
(i.e., hyperparameter tuning) is done
on each fold performed by validate
.
# for calibration plot
eval <- seq(min(lasso_p), max(lasso_p), length.out=100)
iv_lasso <- validate(method = "cv_optimism", data = gusto,
outcome = "y", model_fun = lasso_fun,
pred_fun = lasso_predict, B = 10,
calib_args=list(eval=eval))
iv_lasso
#> C Brier Intercept Slope Eavg E50 E90 Emax
#> Apparent 0.7995 0.05990 0.036 1.017 0.0039 0.0028 0.0079 0.082
#> Optimism 0.0055 -0.00052 0.065 0.024 -0.0132 -0.0066 -0.0320 -0.143
#> Corrected 0.7940 0.06042 -0.029 0.993 0.0171 0.0094 0.0400 0.225
#> ECI
#> Apparent 0.0041
#> Optimism -0.1224
#> Corrected 0.1264
cal_plot(iv_lasso)
For more examples of user defined model functions (including elastic
net and random forest) can be found in
vignette("validate-examples")
.
User-defined score functions
The scores returned by score_binary
should be enough for
most clinical prediction model applications but sometimes different
measures may be desired. This can be achieved by specifying
score_fun
. This should take two arguments, y
and p
, and can take optional arguments.
score_fun
should return a named vector of scores calculated
from y
and p
.
The function sens_spec
takes an optional argument
threshold
that is used to calculate sensitivity and
specificity. If threshold
is not specified it is set to
0.5.
sens_spec <- function(y, p, ...){
# this function supports an optional
# arg: threshold (set to .5 if not specified)
dots <- list(...)
if ("threshold" %in% names(dots)){
thresh <- dots[["threshold"]]
} else{
thresh <- .5
}
# make sure y is 1/0
if (is.logical(y)) y <- as.numeric(y)
# predicted 'class'
pcla <- as.numeric(p > thresh)
sens <- sum(y==1 & pcla==1)/sum(y==1)
spec <- sum(y==0 & pcla==0)/sum(y==0)
scores <- c(sens, spec)
names(scores) <- c("Sensitivity", "Specificity")
return(scores)
}
The call to validate
below uses the glm
fit
from the beginning of this vignette and uses the sens_spec
function to calculate bias-corrected sensitivity and specificity with a
threshold of 0.2 (in this case assessing classification stability would
be important).
validate(fit = mod, score_fun = sens_spec, threshold=.2,
method = "cv_optimism", B = 10)
#> Sensitivity Specificity
#> Apparent 0.3045 0.93990
#> Optimism 0.0074 0.00012
#> Corrected 0.2971 0.93978