# how coxph in R handles ties x = c(1,0, 1,0,0); t = c(4,4, 5,5,5); event = c(1,1, 0,0,0); # 2 events at t=4 require(survival) coxph(formula, data=parent.frame(), weights, subset, na.action, init, control, method=c("efron","breslow","exact"), singular.ok=TRUE, robust=FALSE, model=FALSE, x=FALSE, y=TRUE,... ) method a character string specifying the method for tie handling. If there are no tied death times all the methods are equivalent. Nearly all Cox regression programs use the Breslow method by default, but not this one. The Efron approximation is used as the default here, as it is much more accurate when dealing with tied death times, and is as efficient computationally. The exact method computes the exact partial likelihood, which is equivalent to a conditional logistic model. If there are a large number of ties the computational time will be excessive. summary( coxph(Surv(t,event) ~ x, method="exact" ) ) summary( coxph(Surv(t,event) ~ x, method="breslow" ) ) summary( coxph(Surv(t,event) ~ x, method="efron" ) ) # ---------------------------------------- # What coxph in R calls "Exact" method # ---------------------------------------- Riskset at t=4 HR's in Riskset at t=4 event no event x=1 1 1 exp(beta) x=0 1 2 1 L = prob of onserving what we did. i.e., 2 deaths, 1 where HR = exp(b) = b say (1 other, who survived, has HR = b) 1 where HR = exp(0) = 1 (2 others, who survived, have HR = 1) 20 possibilities for how could have 2 events among 5 (10 above diagonal are same as 10 below) bb111 b x---- using b as shorthand for exp(beta) b Bx--- using B is shorthand for b-squared 1 bbx-- 1 bb1x- 1 bb11x so denominator of "L" is b.sq + 6b + 3 numerator is b.1 = b lik.exact = function(b) b/(b*b + 6*b +3) b=seq(1,3,.1) ;plot(b,lapply(b,lik.exact)) b=seq(1.5,2,.01) ;plot(b,lapply(b,lik.exact)) # ---------------------------------------- # All orderings # ---------------------------------------- 2 possibilities for how could have 2 events among 5 1. person with x = 0 followed by person with x = 1 2. person with x = 1 followed by person with x = 0 either-or.. Prob(1 or 2) = Prob(1+ + Prob(2) Lik = Prob(x = 0 followed by x = 1) + Prob(x = 1 followed by x = 0) using same shorthand as above ... (& risksets of sizes 5 and 4) Lik = [ 1/(2*b +3) ] x [ b/(2*b+2) ] + [ b/(2*b +3) ] x [ 1/(b+3) ] lik.all.orderings = function(b) (1/(2*b +3))*(b/(2*b+2)) + (b/(2*b +3))*(1/(b+3)) b=seq(1,3,.1) ;plot(b,lapply(b,lik.all.orderings)) b=seq(1.55,1.7,.01) ;plot(b,lapply(b,lik.all.orderings)) # ---------------------------------------- # "Breslow" approximation to all orderings # ---------------------------------------- b/(2*b+2) =(approx) b/(2*b+3) and 1/(b+3) =(approx) 1/(2*b+3) Approx.Lik = ( b x 1 ) / square of (2 x b + 3 ) lik.Breslow = function(b) b / ( (2*b + 3)^2 ) b=seq(1,3,.1) ;plot(b,lapply(b,lik.Breslow)) b=seq(1.25,1.75,.01) ;plot(b,lapply(b,lik.Breslow)) # ---------------------------------------- # "Efron" approximation to all orderings # ---------------------------------------- L = bc/[a(a-b)] + bc[a(a-c)] can be approximated by L = 2bc/[a(a - (b+c)/2) ] lik.Efron = function(b) 2*b / ( (2*b + 3)*(2*b+3 - (b+1)/2) ) b=seq(1,3,.1) ;plot(b,lapply(b,lik.Efron)) b=seq(1.5,1.7,.01) ;plot(b,lapply(b,lik.Efron)) =============== SAS proc phreg v9. ===================== Ties-Handling Option im MODEL statement TIES=method specifies how to handle ties in the failure time. The TIES= option can take the following values: BRESLOW uses the approximate likelihood of Breslow (1974). This is the default value. DISCRETE replaces the proportional hazards model by the discrete logistic model EFRON uses the approximate likelihood of Efron (1977). EXACT computes the exact conditional probability under the proportional hazards assumption that all tied event times occur before censored times of the same value or before larger values. This is equivalent to summing all terms of the marginal likelihood for that are consistent with the observed data (Kalbfleisch and Prentice 1980; DeLong, Guirguis, and So 1994). The EXACT method may take a considerable amount of computer resources. If ties are not extensive, the EFRON and BRESLOW methods provide satisfactory approximations to the EXACT method for the continuous time-scale model. In general, Efron's approximation gives results that are much closer to the EXACT method results than Breslow's approximation does. If the time scale is genuinely discrete, you should use the DISCRETE method. The DISCRETE method is also required in the analysis of case-control studies when there is more than one case in a matched set. If there are no ties, all four methods result in the same likelihood and yield identical estimates. The default, TIES=BRESLOW, is the most efficient method when there are no ties. data a; input x t event; lines; 1 4 1 0 4 1 1 5 0 0 5 0 0 5 0 ; proc phreg; model t*event(0)=x / RISKLIMITS TIES = BRESLOW; proc phreg; model t*event(0)=x / RISKLIMITS TIES = DISCRETE; proc phreg; model t*event(0)=x / RISKLIMITS TIES = EFRON; proc phreg; model t*event(0)=x / RISKLIMITS TIES = EXACT; RUN; =============== Stata v 8 ===================== clear input x t event 1 4 1 0 4 1 1 5 0 0 5 0 0 5 0 end stset t, failure(event=1) * breslow, efron, exactm, and exactp each specify a method for handling tied failures * in the calculation of the model (and residuals). breslow is the default. Note that * efron and the exact methods require substantially more computer time than the default * breslow option. exactm and exactp may not be specified * with robust or cluster(), or with tvc(). stcox x, breslow stcox x, efron stcox x, exactm stcox x, exactp S U M M A R Y . . . . ........................HR via SAS......via Stata...via.coxph(R).... All Orderings(1) 1.64 exact 1.64 exactm* not available Breslow 1.50 breslow 1.50 breslow 1.50 breslow Efron 1.58 efron 1.58 efron 1.58 efron Truly tied (discrete)(2) 1.73 discrete 1.73 exactp** 1.73 exact Notes: (1) all orderings in time (times could be ordered with more decimal places) (*) exact Marginal lik. (**) exact partial lik. ================ jh march 20, 2007