Accelerated failure time model in XGBoost
Question 1: What is AFT model?
Let T be ‘failure time’, the response variable, and
Under censoring, we can observe a bivariate vector
Suppose:
This function may replaced by other strictly increasing functions.
Question 2: What are the advantages of the AFT model?
AFT modle can predict the failure risk while Cox-PH does not yield a usable predicted risk directly, it rather gives relative importance of features.
AFT support all three censoring types (Right, Left, and Interval).
AFT provides a better fit when proportional hazard assumption does not hold.
Question 3:How it works in XGBoost?
Likelihood function taking account of three censoring types.
XGBoost optimizes a twice-differentiable convex loss function
Now let’s define a loss function
As usual, we change it to log scale, the goal is changed to maximize log likelihood.
Under censoring, we don’t know
where
Likelihood function in cooperation with AFT
AFT model:
Suppose
Therefore, the loss function of AFT model will be:
where
Gradient and hessian of the AFT loss
The gradient boosting algorithm in XGBoost uses the gradient and hessian of the loss function, which are first and second partial derivatives of
Cite1: Survival regression with accelerated failure time model in XGBoost
Cite2: XGBoost Documentation