Research Article Volume 7 Issue 3
1Department of Mathematics and Statistics, University of North Carolina at Charlotte, USA
2Department of Mathematics, Minjiang University, China
Correspondence: Yuze Yuan, Department of Mathematics, Minjiang University, China
Received: April 28, 2018 | Published: May 15, 2018
Citation: Yuan Y, Jiang J, Chen L. On local polynomial estimation of hazard rates and their derivatives under left truncation and right censoring models. Biom Biostat Int J. 2018;7(3):199–204. DOI: 10.15406/bbij.2018.07.00209
Estimating hazard rate function is an important problem in survival analysis. There are some estimation approaches based on kernel smoothing. However, they suffer from the boundary effects or need high order kernels, which increases the mean squared error. We introduce local polynomial estimators of hazard rates and their derivatives for the left truncation and right censoring models. The estimators have favorable properties similar to those of local polynomial regression estimators. Asymptotic expressions for the mean squared errors (AMSE’s) are obtained. Consistency and joint asymptotic normality of the local polynomial estimators are established. A data-based local bandwidth selection rule is proposed.
Keywords: censoring, data-driven local bandwidth selection, hazard rate, local polynomial, truncation
Consider a subject in survival studies. Only if its onset time, i.e. the time origin of its lifetime, passes the beginning of the study, the subject can enter into the study. For the entered individuals, each of them is then followed for a fixed time point. Such subjects are so-called left truncated and right censored. To be specific, let X be the lifetime with distribution function (df) F(x), T the random left truncation time with arbitrary df G(x), and C the random censoring time with arbitrary df L(x), where s is independent of (T,C). Then the cumulative hazard function of F(x) is Λ(x)=∫x0dF(t)/[1−F(t)]. Under the left truncation and right censoring model, one observes (Y,T,δ) if Y≥T, where Y=min(X,C) and δ=I(X≤C) is an indicator of the censoring status of X which takes value one if X≤C and zero otherwise. When Y<T, nothing is observed (see for example, Gurler & Wang1). Let the distribution of Y be W(y) and assume that α=P(T≤Y)>0. Then ˉW=ˉFˉL, where and throughout the paper for any df. E, ˉE=1−E is the corresponding survival function.
The left truncation and right censoring model has been investigated by many authors. Interesting work along the field can be found in Gross & Lai2 and Gurler & Wang1 among others. Several authors have considered the estimation of hazard functions under the left truncation and right censoring model. For examples, Uzunogullari & Wang,3 and Wu & Wells.4
In the present investigation, we study local polynomial (LP for short) estimators of hazard functions and their derivatives based on the i.i.d sample {(Yi,Ti,δi)}ni=1 from (Y,T,δ). Under the left truncation and right censoring model, one observes only those i.i.d pairs (Yi,Ti,δi) for which Yi≥Ti. It is worth pointing out that our estimators inherit some favorable properties from local polynomial regression estimators, in particular, our estimators can reduce the bias according to the degree of the polynomial without increasing the variance and automatically correct the left boundary effect. The point wise asymptotic normality of our estimators enables one to find the asymptotically optimal variable bandwidth choice, and thereafter allows one to develop a data-driven optimal local bandwidth selector by using the ideas of Fan & Gurler.5 We here present a simpler data-driven method for choosing the local bandwidth.
The outline of this paper is as follows. In Section 2, we introduce the LP estimators. In Section 3 we concentrate on the asymptotic properties of the proposed estimators, including point wise strong consistency and joint asymptotic normality. In Section 4, we propose the data-driven local bandwidth selection rule. Technical proofs are given in the Appendix.
In order to introduce the LP estimators, we use the following notation from Gurler & Wang1
Let aE=inf{t: E(t)>0} and bE=inf{t: E(t)=1} denote the left and right endpoints of the support for any d.f. E(x), respectively. Then F(x) is identifiable if aG≤aW and bG≤bW (Woodroofe6 and Gurler & Wang1).
Therefore, we assume this condition holds. As in Gurler & Wang,1 for estimating the density function f(x)of X, we also assume aG≤aW.
Following Gurler & Wang,1 we define W1(y)=P(Y≤y,δ=1|Y≥T) and W1n(y)=n−1∑ni=1I(Yi≤y,δi=1). Then dW1(y)=α−1P(T≤Y≤C)dF(y) and
Λ(x)=∫x0dW1(y)/C(y).
By Gurler & Wang,1 the Nelson-Aalen type estimator of Λ(x) is given by
Λn(x)=∫x0[Cn(y)]−1dW1n(y)=n∑i=1I(Yi≤x, δi=1)nCn(Yi). (1)
Gijbel & Wang1 considered the following kernel estimator for the density function f(x) of X, which is a convolution of the product-limit estimator Fn(x) with an appropriate kernel function Kν:
ˆf(ν)(x)=1bν+1n∫Kν(x−ubn)dFn(u), (2)
where Kν is a higher order kernel. The method can be also adapted to the case of estimating the hazard function λ(x) and its derivatives if one uses Λn(x) instead of Fn(x):
ˆλ(x)=1bν+1n∫Kν(x−ubn)dΛn(u). (3)
The estimator is an extension to that of Müller & Wang7,8 where right censoring model is considered. However, for the estimation of derivatives or reduction of bias, the estimator needs higher order kernels, which can lead to a negative hazard rate estimator. The practical advantages of using higher order kernels can be quite small for moderate sample sizes as demonstrated in Marron & Wand.9 When estimating at point x near aG or bW, the effective support [x−bn,x+bn] of the kernel is not contained in [aG,bW], most kernel estimators in density estimation and regression settings will encounter boundary effects. The estimator (3) suffers from boundary effects near the endpoints of the support of the hazard rates. In the presence of censoring for estimating hazard rate function, Müller & Wang9 solved the problem by employing boundary kernels and a data-adaptive varying bandwidth selection procedure. Hall & Wehrly10 used a geometrical method for removing edge effects from kernel-type nonparametric regression estimators. These boundary correction methods may also be adapted to the estimation approach in (3). Here we introduce a simple and intuitive approach to the problem. Our approach does not need higher order kernels or boundary kernels while automatically correcting the boundary effects. Our idea is similar to that of Jiang & Doksum,11 but their procedure cannot be directly applied to the current setting.
Following Jiang and Doksum,11 we consider the following optimization problem:
minaj∫1bnK(u−xbn)[λ(u)−p∑j=0aj(u−x)j]2du. (4)
By Taylor expansion, the solution of the optimization problem, a*(x)≡(a*0,⋯,a*p)T, will estimate a(x)≡(λ(x),⋯,λ(p)(x)/p!)T. Since Λn(x) in (1) is the empirical estimator of Λ(x), we define the following generalized empirical hazard rate as the generalized derivative of Λn(x):
λn(x)=n∑i=1D(x−Yi)δinCn(Yi), (5)
where D(x) is the Dirac function with the following property:
∫g(u)D(u−x)du=g(x)
For any integral functiong(x). Then ∫x0λn(t)dt=Λn(x), which is why we call λn the generalized empirical hazard rate. Replacing λ(x) in (4) by λn(x), we obtain that
ˆa(x)≡(ˆa0,⋯,ˆap)T=argminaj∫1bnK(u−xbn)[λn(u)−p∑j=0aj(u−x)j]2du. (6)
Then ˆa(x) is the LP estimator of a(x). Jones12 considered a locally linear estimator and established its link with the generalized jackknife boundary correction for p=1 by using the ideas of Lejeune & Sarda13 in local linear fitting to the empirical distribution Fn. Here we study the local polynomial estimation of hazard functions and their derivatives, a(x), under the left truncation and right censoring model. Obviously, our method can be used for the complete data case, which corresponds to “no truncation and no censoring”.
Taking the derivative with respect to a’s of the integral in (6), we obtain the LP estimator ˆa(x) as the solution to the linear equations: for l=0,…,p,
n∑i=11bnK(Yi−xbn)(Yi−x)lδinCn(Yi)=p∑i=0ai∫u≥0(u−x)i+l1bnK(u−xbn)du. (7)
It follows that the LP estimator at any point x0∈(aG,bW) has the following closed form:
Bˆa(x0)=S−1Sn(x0), (8)
where Sn(x0)=(Sn0(x0),⋯,Snp(x0))T, and
Snl(x0)=n∑i=11bnK(Yi−x0bn)(Yi−x0bn)lδinCn(Yi). (9)
When p=0 and s0=1, ˆa0=∑ni=11bnK(Yi−x0bn)δinCn(Yi)=1bn∫K(x0−ubn)dΛn(u), which is the same as the kernel estimator of λ(x0) in (3). However, this equivalence does not hold for boundary points.
We will show in next section that the LP estimator shares nice properties with the local polynomial regression estimator, in particular, the estimator will keep its convergence rate up to the left boundary point, i.e. the estimator automatically corrects the left boundary effect, which contrasts with the results for other hazard rate estimators.
In this section, we will establish the consistency and joint asymptotic normality of the local polynomial estimators. To this end, we introduce some regularity conditions. For a given point x0∈(aG,bW), the following notations and assumptions are needed.
The hazard rate function λ(x)=Λ′(x) has a continuous (p+1)th derivative at the point x0.
The sequence of bandwidths bn tends to zero such that nbn→+∞ as n→+∞. Let B= diag (1,bn,⋯,bpn).
C(x) is continuous at the point x0.
The kernel function K(x) is a continuous function of bounded variation and with bounded support [−1,1], say. Let sl=∫1−1K(u)uldu, vl=∫1−1ulK2(u)du, for l≥0, cp=(sp+1,⋯,s2p+1)T, S=(si+j) and V*=(vi+j) ( for 0≤i≤p; 0≤j≤p) be (p+1)×(p+1) matrices.
Theorem 3.1 Under conditions (A1) – (A4),
B(ˆa(x0)−a(x0))a.s.→0, n→∞.
Theorem 3.2 Under conditions (A1) – (A4),
√nbn[B(ˆa(x0)−a(x0))−λ(p+1)(x0)bp+1n(p+1)!S−1cp(1+op(1))]
L→N(0,S−1V*S−1λ(x0)C−1(x0)). (10)
Remark 3.1
When estimating a hazard rate which is a polynomial of order p on an interval, the finite sample bias of the LP estimators on the interval is zero (see the proof of Theorem 3.2). This contrasts with the methods of Müller and Wang7,8 based on higher order kernels, for which the respective zero bias only holds true asymptotically.
Remark 3.2
When there is no truncation, we take T=0 and G(x)=1 over the support of F(x), then C(x)=α−1P(C≥x)ˉF(x)=ˉW(x), and the asymptotic normality for interior point x0 is the same as in Müller & Wang.7,8 When there is no censoring, we take the censoring variable C=+∞ and L(x)=0 over the support of F(x), then C(x)=P(T≤x≤X|Y≥T), and the asymptotic normality for interior point x0 is the same as in Gürler & Wang.1
AMSEk(bn,x0)=b2(p+1−k)n[eTkS−1cpλ(p+1)(x0)(p+1)!]2+1nb2k+1neTkS−1V*S−1ekλ(x0)C(x0) (11)
where ek=(0,⋯,1,⋯,0)T has one in the k+1th component and zeros in the others. Therefore, the optimal local bandwidth for estimating the kth derivative of λ(x) at x0, in the sense of minimizing AMSEk(bn,x0), is
bk,opt(x0)=n−12p+3([(p+1)!]2eTkS−1V*S−1ekλ(x0)/C(x0)2(p+1−k)[λ(p+1)(x0)]2(eTkS−1cp)2)1/(2p+3). (12)
Theorem 3.3
Consider the left edge effect on the estimator. Assume that we estimate a(x) at xn=dbn in the left boundary region for some positive constant d∈[0,1]. Then similar to (8), ˆa(xn) in (6) has the following closed form, for p=0,1,2,⋯,
Bˆa(xn)=S−1dSn(xn), (13)
where Sd defined as S but with si replaced by si,d=∫1−duiK(u)du. Let vi,d=∫1−duiK2(u)du. Then the joint asymptotic normality (10) continues to hold with cp, S and V* replaced by cp,d, Sd and V*d, respectively, where cp,d=(sp+1,d,⋯,s2p+1,d)T and V*d=(vi+j,d) is (p+1)×(p+1) matrices.
This property of our estimator in Theorem 3.3 is similar to that of local polynomial regression estimation, which is not shared by other kernel estimators of hazard rates (Hess et al.14) The LP estimators are automatically boundary adaptive in the sense of Fan & Gurler.15 Note that the above property holds even for p=0, which contrasts with the cases of local polynomial regression.
Remark 3.3 For a finite sample, one may encounter right boundary effects when estimating near T. A good method for dealing with the problem is to use the following estimator similar to the estimator in (13):
Bˆa(x0)=S−1qSn(x0), (14)
where x0=T−qbn for q∈[0,1], and Sq defined as S but with si replaced by si,q=∫q−1uiK(u)du.
The proposed estimators depend on the band width bn. It is important to develop a local bandwidth choice for estimating hazard functions and their derivatives, especially when one would like to have a data-driven approach to bandwidth choice in practice. For hazard rate estimation, Patil16 considered least squares cross-validation bandwidth selection, and Gonz lez-Manteiga et al.17 studied smoothed bootstrap selection of the global bandwidth. Müller & Wang,7 Hess et al.14 and Jiang & Doksum11 studied the local bandwidth choice for estimating hazard rates under right censoring. Here we extend the data-driven local bandwidth choice of Jiang & Doksum11 to the left truncation and right censoring model.
From the proof of Theorem 3.2, we see that the exact bias of the estimator ˆak(x0) is
Bk(bn,x0)=eTk(S−1=9βn(x0)−Ba(x0)), (15)
where =9βn(x0)=(βn0(x0)⋯,βnp(x0))T and βnk(x0)=∫K(t)tkλ(x0+bnt)dt. The asymptotic variance of ˆak(x0) is
Vk(bn,x0)=1nbneTkS−1˜VS−1ek, (16)
where ˜V=(˜vij)and ˜vij=∫[K2(t)ti+jλ(x0+bnt)/C(x0+bnt)]dt. Then we propose to estimate the AMSE of ˆλk(x0)=k!ˆak(x0) via
"0362AMSEk(bn,x0)=ˆB2k(bn,x0)+ˆVk(bn,x0), (17)
where ˆBk(bn,x0) and ˆVk(bn,x0) are defined similarly to Bk(bn,x0) and Vk(bn,x0) but with λ(x) replaced by a pilot estimator and C(x) replaced by its empirical estimator Cn(x). We define
ˆbk,opt(x0)=argminb"0362AMSEk(b,x0).
Then, for estimating λ(x) when p=1, the following algorithm similar to Jiang and Doksum11 can be used for estimating λ(x).
Algorithm for estimatingλ(x):
Step 1: Pilot estimators of λ(x): Choose a kernel, such as the Epanechnikov kernel, and an initial global bandwidth b0. The choice of the initial bandwidth depends on the specific case. Assume the data are available on [0,T], then a possible value for b0 is T/(8n15u) as recommended by Muller & Wang,8 where nu is the number of uncensored observations. The pilot estimators ˆλ(x) of λ(x) are obtained by using bn(x)≡b0 and our estimators (8), (13) and (14).
Step 2: Minimizing of "0362AMSE(bn,x): Choose an equispaced grid of m1 points ˜xi, i=1,⋯,m1 between 0 and T. For each of the grid points ˜xi compute "0362AMSE(bn,˜xi) in (17) with k=0 and obtain its minimizes ˜b(˜xi) on the interval [b0/4,4b0], say. The minimization of "0362AMSE(bn,˜xi) may be computed via Discretisation.
Step 3: Smoothing bandwidths: Choose another equispaced grid of m2 points xr, r=1,⋯,m2, over the interval [0,T] on which the final hazard estimator is desired. Running local linear smoother by employing global bandwidth ˜b0=b0 or 32b0:
ˆb(xr)=m1∑i=1wi˜b(˜xi)/m1∑i=1wi,
where wi=K(˜xi−xr˜b0)[Mn,2−Mn,1(˜xi−xr)] and Mn,j=∑m1i=1K((˜xi−xr)/˜b0)(˜xi−xr)j, for j=1,2.
Step 4: Final hazard function estimators: Using (8), (13) and (14), obtain the estimators ˆλ(xr) by employing the bandwidth ˆb(xr), for r=1,⋯,m2.
The above algorithm can be repeated by using the estimators ˆλ(xr) in Step 4 as pilot estimators in Step 1 and running Step 2 – Step 4 again. The pilot estimators of λ(x) in Step 1 can also be obtained via maximum likelihood if one has a plausible parametric model in mind.18
Proofs of theorems 3.1 and 3.2
Using (1) and (9), we obtain that
Snl(x0)=∫1bnK(u−x0bn)(u−x0bn)ldΛn(u)
=∫1bnK(u−x0bn)(u−x0bn)ldΛ(u)+∫1bnK(u−x0bn)(u−x0bn)ld[Λn(u)−Λ(u)]
≡βnl(x0)+γnl. (18)
Let βn(x0)=(βn0,⋯,βnp)T and γn(x0)=(γn0,⋯,γnp)T. Then Sn(x0)=βn(x0)+γn(x0). By (8), we have
B[ˆa(x0)−a(x0)]=[S−1βn(x0)−Ba(x0)]+S−1γn(x0)
≡αn(x0)+S−1γn(x0). (19)
We will show that αn(x0) contributes to the bias term, and S−1γn(x0) the variance term of our estimator.
βnl(x0)=∫K(t)tlλ(x0+bnt)dt
=p∑j=0bjnj!λ(j)(x0)sl+j+bp+1n(p+1)!λ(p+1)(x0)sl+p+1(1+op(1)).
Then
αn(x0)=bp+1n(p+1)!λ(p+1)(x0)S−1cp(1+o(1)). (20)
In particular, if λ(x) is a polynomial up to order p in a neighborhood ofx0, then the exact bias αn(x0) of the LP estimator is zero.
Λn(x)−Λ(x)=1nn∑i=1ξ(Yi,Ti,δi,x)+rn(x), (21)
where for x≥0, any b<bW and δ=1 or 0, sup0≤x≤b|rn(x)|=O(lognn), a.s. and
ξ(y,t,δ,x)=I(y≤x,δ=1)/C(y)−∫x0I(t≤u≤y)[C(u)]−2dW1(u),
which satisfies that Eξ(Xi,δi,x)=0, and Cov(ξ(Yi,Ti,δi,s),ξ(Yi,Ti,δi,t))=g(min(s,t)), where g(x)=∫x0[C(u)]−2dW1(u). Let Kl(t)=K(t)tl. Using the definition of γnl, (21) and integration by parts, we obtain the following almost surely representation of γnl(x0):
γnl(x0)=σnl(x0)+enl(x0), (22)
where
σnl(x0)=n−1n∑i=1∫1bnK(u−x0bn)(u−x0bn)ldξ(Yi,Ti,δi,u)
=(nbn)−1n∑i=1∫ξ(Yi,Ti,δi,x0+bnt)dKl(t)
is the stochastic component of Snl(x0) and contributes to the variance of our estimator, and enl(x0) is the negligible error of the approximation which satisfies
sup0≤x0≤b|enl(x0)|=O(lognn), a.s. (23)
for 0≤l≤p. Note that E(σnl(x0))=0 and
Cov(σnl(x0),σnm(x0))=1nbn∫∫1bng(min(x0+bnu,x0+bnv))dKl(u)dKm(v)
=1nb2n∫+∞−∞∫x0+bnv−∞Kl(u−x0bn)dg(u)dKm(v)
=1nbnλ(x0)C(x0)vl+m(1+o(1)). (24)
Let σn(x0)=(σn0,⋯,σnp)T and en(x0)=(en0,⋯,enp)T. Then by (22) and (24)
√nbnS−1γn(x0)=√nbnS−1σn(x0)+√nbnS−1en(x0) (25)
and Cov(σn(x0),σn(x0))=1nbnλ(x0)C(x0)V*(1+o(1)). By the central limit theorem, we get
√nbnσn(x0)L→N(0,λ(x0)C(x0)V*). (26)
By (23), we know
sup0≤x0≤b|S−1en(x0)|=O(lognn), a.s. (27)
Then by (25), (26) and (27)
√nbnS−1γn(x0)L→N(0,λ(x0)C(x0)S−1V*S−1). (28)
Combination of (19), (20) and (28) completes the proof of Theorem 3.2.
Note thatσnl(x0), for l=0,⋯,p, are i.i.d.’s sums. By the SLN, we know σnl(x0)a.s.→E(σnl(x0))=0. Then σn(x0)a.s.→0. This combined with (22) and (23) yields γn(x0)a.s.→0. Therefore, by (19) and (20),
B[ˆa(x0)−a(x0)]a.s.→0.
Proofs of Theorem 3.3 The result follows by the same argument as in part (i) of the proof of Theorem 3.2.
This work is partially supported by Natural Science Foundation of Fujian Providence of China (2016J01024) and by Education and Scientific Research Projects for young and middle-aged teachers of Fujian Providence (JAT160383).
Author declares that there are no conflicts of interest.
©2018 Yuan, et al. This is an open access article distributed under the terms of the, which permits unrestricted use, distribution, and build upon your work non-commercially.
2 7