Data-Analysis Exercise

Continuous-Time Survival Analysis

(ALDA, Chapters 13, 14 & 15, pp. 468-606)

 

 

Purpose of the Exercise

This data-analytic exercise supplements our presentation on continuous-time survival analysis.  The immediate purpose for the exercise is to introduce the analysis of continuous-time event data with SAS PROC LIFETEST and PHREG.

 

General Resources for Conducting the Exercise

As before, while we provide self-contained instructions below for conducting the exercise, we recommend that you keep the following resources close at hand:

·         ALDA itself, and the handouts from our presentation.

·         The UCLA Academic Technology Services website that supports ALDA, at http://www.ats.ucla.edu/stat/examples/alda/ .

Datasets

The datasets used in this exercise were formatted by UCLA Academic Technology Services and are available for download, as text and system files from their website.  For our current purposes, the datasets have already been downloaded onto your workstation and stored in a directory on the hard drive.  You will need to know the address of this directory in order to conduct the analyses below.

Data-Analysis Exercise

Please proceed sequentially through the requested tasks:

1.      You will begin by using the continuous-time horn honking data featured in ALDA Table 13.1, on p. 471 ff., and in our presentation.  Please have this exhibit open.

2.      The files containing these data have the prefix honking.  They appear in the data directory several times with different suffixes denoting different versions of the file -- as raw text and as system files appropriate for analyses with different software packages.

a.       Examine the raw text dataset, called honking.txt, by opening Windows Explorer and clicking on the filename in the appropriate data directory on your hard drive.  Windows Notepad will display the file contents.  Make sure that the first four lines of the dataset match the following:

 

Text Box: "ID","SECONDS","CENSOR"
1,2.88,0
2,4.63,1
3,2.36,1

 


b.      Notice that the raw text data-file contains comma-delimited text, with the first line containing variable names, in quotes: ID, SECONDS, CENSOR.  By consulting ALDA, match these names with the variable descriptions on p. 471, noting the coding of the CENSOR variable for cases #2 and #3 (their corresponding censoring status is indicated by an asterisk in the ALDA table).  Make sure that you can line up the names of the variables with their values for the first and second cases, and that you understand why they take on the values that they do.

3.      Boot up PC-SAS.

4.      Begin your analyses of the honking dataset.

a.       Read the honking person-period system file (honking.sas7bdat) into SAS and estimate summary statistics (Kaplan-Meier estimates of the survival probability, related plots and ancillary statistics) using the following SAS code (replacing the data file name with a name appropriate for your system):

Text Box: proc lifetest data="c:/????/honking.sas7bdat" method=km plots=(s,ls);
  time seconds*censor(1);
run;

 


b.      The SAS LIFETEST procedure generates sample summaries and plots of the continuous-time survival data, including most of the statistics presented in Chapter 13 of ALDA.

                                                               i.      In the “proc” sentence, two options are exercised.

1.      The “method” option selects the Kaplan-Meier (Product-Limit) method for estimating survivor probabilities (other methods are also available, including those described in Chapter 13 of ALDA).

2.      The “plots” option is used to request plots of the estimated survivor function and the negative log-survivor function.

                                                             ii.      The “time” command is required to specify the variable that contains the continuous event times (SECONDS) and the variable that contains the censoring information (CENSOR), along with indicating the value of that variable which represents that censoring has occurred (1).

c.       Execute your program, obtain output, and print it out for your records.  You will notice that two kinds of output are created:

                                                               i.      Regular SAS output which lists the event times, the KM estimates of the survival probabilities, and ancillary statistics.  You should compare these to the statistics listed in Table 13.3 on p. 484 in ALDA.  Notice particularly what happens in the SAS output when ties occur, and contrast this with the presentation in the table.

                                                             ii.      Two pages of SASGRAPH output, in a separate GRAPH window, containing “publication quality” estimated survivor and negative log-survivor functions.  You will need to print these plots out separately from the sample statistics listing.  Compare the plot of the estimated survivor function to Figure 13.2 on p. 485 of ALDA and the estimated negative log-survivor function to Figure 13.4 on p. 493 of ALDA.

5.      Now, let’s switch to the re-arrest data on the 194 former inmates who were tracked subsequent to their release from medium security prison.   These data are featured in ALDA Figure 14.1 on p. 505 ff.  Please have this exhibit open during your analyses.

6.       The files containing these data have the prefix rearrest.  They appear in the data directory several times with different suffixes denoting different versions of the file -- as raw text and as system files appropriate for analyses with different software packages.

a.       Examine the raw text dataset, called rearrest.txt, by opening Windows Explorer and clicking on the filename in the appropriate data directory on your hard drive.  Windows Notepad will display the file contents.  Make sure that the first four lines of the dataset match the following:

Text Box: "id","months","censor","personal","property","cage"
1,.06570841889117043,0,1,1,-1.675197757186858
2,.13141683778234087,0,0,1,-10.48286373939083
3,.2299794661190965,0,1,1,-4.426737798254621

 


b.      Notice that the raw text data-file contains comma-delimited text, with the first line containing variable names, in quotes:  ID, MONTHS, CENSOR, PERSONAL, PROPERTY, CAGE.  By consulting ALDA, match these names with the variable descriptions on p. 504 ff., noting the coding of the CENSOR variable for case #8.  Make sure that you can line up the names of the variables with their values for the first and second cases, and that you understand why they take on the values that they do.

7.      Boot up PC-SAS.

8.      Begin your analyses of the rearrest dataset.

a.       Read the rearrest person-period system file (rearrest.sas7bdat) into SAS and conduct a Cox regression analysis, fitting Model A of Table 14.1 on p. 525 of ALDA, using the following SAS code (replacing the data file name with a name appropriate for your system):

Text Box: proc lifetest data='c:\????\rearrest' method=km plots=(s,ls);
  time months*censor(1);
  strata personal;
proc phreg data='c:\alda\rearrest' ;
  model months*censor(1)=personal/ties=efron;
run;

 


b.      Here, two SAS procedures are implemented consecutively: LIFETEST and PHREG (misnamed as “Proportional Hazards Regression”).

 

                                                               i.      We used LIFETEST with the additional “strata” command to produce plots of the KM-estimates of the survivor function, by levels of predictor PERSONAL.

                                                             ii.      We used PHREG to fit the indicated Cox regression model.

1.      As usual, the “model” sentence contains the names of the time variable, MONTHS, the censoring information variable CENSOR(1), and the predictor, PERSONAL.  It also indicates that we want to use the “Efron” method to deal with ties.

 

c.       Execute your program, obtain output, and print it out for your records.  You will notice again that several kinds of output are created:

                                                               i.      You get two pages of SASGRAPH output, in a separate GRAPH window, from the LIFETEST procedure, containing “publication quality” estimated survivor and negative log-survivor functions, by levels of the predictor PERSONAL.  Print these plots out separately from the other output.  Compare the plots to the top two panels of Figure 14.1 on p. 505 of ALDA.

                                                             ii.      You get regular output from both LIFETEST (extensive summary statistics, by levels of PERSONAL) and PHREG (the Cox regression output).

d.      Examine the Cox regression output, comparing it to fitted Model A of Table 14.1 on p. 525:

                                                               i.      Identify the regression parameter estimate and standard error associated with predictor PERSONAL.  They are contained in the very last line of the output, under “Analysis of Maximum Likelihood Estimate” (an interesting misnomer!).

                                                             ii.      The –2Log Likelihood statistic, under “Model Fit Statistics/With Covariates.”

                                                            iii.      The information statistics, also under “Model Fit Statistics/With Covariates.”

e.       Edit your SAS code to fit Model D from Table 14.1 on p. D, by adding the additional predictors to the appropriate place on the “model” line in PHREG.

9.      If you have any time left, take a look at the programs and output for ALDA Chapters 13, 14 and 15 on the UCLA support site.  There you will find examples of SAS code for outputting a variety of interesting fitted plots, diagnostics, and for conducting extensions of the basic Cox model that are described in Chapter 15 of ALDA, including the inclusion of time-varying predictors in the model.