Accelerated failure time model in XGBoost

Question 1: What is AFT model?

Let T be ‘failure time’, the response variable, and x be the corresponding covariate vector. If there is no censored observation, we probably would regress T on x.

Under censoring, we can observe a bivariate vector (Yi,δi), where Yi=min(Ti,Ci) and δi=1 if Yi=Ti and 0 otherwise. Ci is time of censoring, it is independent of Yi conditional on covariates for the individual i.

Suppose: Ti=exp(βxi+ei) where ei,i=1,...,n are independent and identically distributed random variables.

This function may replaced by other strictly increasing functions.

Question 2: What are the advantages of the AFT model?

  • AFT modle can predict the failure risk while Cox-PH does not yield a usable predicted risk directly, it rather gives relative importance of features.

  • AFT support all three censoring types (Right, Left, and Interval).

  • AFT provides a better fit when proportional hazard assumption does not hold.

Question 3:How it works in XGBoost?

Likelihood function taking account of three censoring types.

XGBoost optimizes a twice-differentiable convex loss function l(yi,y^i) in its second-order method of gradient boosting.

Now let’s define a loss function lAFT for the AFT model. Let D=(xi,yi)i=1n denote the training data, and Y1,...,Yn denote random variables i.i.d. with distribution for Y. The likelihood for D is the product of probability densities fY for individual data points.

L(D)=P[Y1=y1,,Yn=yn]=i=1nP[Yi=yi]=i=1nfY(yi)

As usual, we change it to log scale, the goal is changed to maximize log likelihood.

lnL(D)=i=1nlnP[Yi=yi]=i=1nlnfY(yi)

Under censoring, we don’t know yi for some individuals. Therefore we revise the likelihood function by using definition of probability to take account of the censored data:

lnL(D)=lnP[Yi=yi]uncensored label +lnP[yiYiyi]censored label with yi[yi,yi]=lnfY(yi)uncensored label +ln(FY(yi)FY(yi)censored label with yi[yi,yi]

where yi and yi are lower and upper bounds for yi, respectively. FY is the cumulative distribution function (CDF). When yi is infinity, it is right-censored data while when yi is 0, it indicates left-censored data.

Likelihood function in cooperation with AFT

AFT model:

ln(yi)=y^i+σzi where y^i=T(x). and NOTE yi indicates time to event and its variance comes from zi. Z is a random variable of a known probability distribution.

Suppose Y=g(Z) and g() is a monotone increasing function. The pdf and cdf of Y can be expressed in terms of pdf and cdf of Z:

fY(y)=fZ(g1(y))ddyg1(y)FY(y)=FZ(g1(y))

Therefore, the loss function of AFT model will be:

AFT(y,y^)={ln[fZ(s(y))1σy]if y is not censored ln[FZ(s(y¯))FZ(s(y))]if y is censored with y[y,y¯]

where s(y)=(lnyy^)/σ

Gradient and hessian of the AFT loss

The gradient boosting algorithm in XGBoost uses the gradient and hessian of the loss function, which are first and second partial derivatives of with respect to y^. The gradient and hessian of the AFT loss function are as follows:

AFTy^|y,y^={fZ(s(y))σfZ(s(y))if y is not censored fZ(s(y¯))fZ(s(y))σ[FZ(s(y¯))FZ(s(y))] if y is censored with y[y,y¯]

2AFTy^2|y,y^={fZ(s(y))fZ(s(y))fZ(s(y))2σ2fZ(s(y))2if y is not censored[FZ(s(y¯))FZ(s(y))][fZ(s(y¯))fZ(s(y))]+[fZ(s(y¯))fZ(s(y))]2σ2[FZ(s(y¯))FZ(s(y))]2if y is censored 

Cite1: Survival regression with accelerated failure time model in XGBoost

Cite2: XGBoost Documentation

Zhe Lu
Zhe Lu
Graduate student & Research assistant

A graduate student pursuing the knowledge of data science.