forked from Black-Dog-Institute/Data-Files
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathSAS CODE_ Survival Analysis.sas
88 lines (80 loc) · 5.48 KB
/
SAS CODE_ Survival Analysis.sas
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
/*Two SAS procedures are useful for survival analysis: PROC LIFETEST and PROC PHREG.*/
/**/
/*The LIFETEST procedure computes life tables using the Kaplan-Meier method (for individual follow-up times) by default; */
/*it can also perform a logrank test to compare survival curves between groups of subjects.*/
/**/
/*The PHREG procedure fits the Cox proportional hazards regression model to survival data; */
/*this allows testing of, and controlling for, the effect of continuous covariates.*/
/*Creating the status variable and survival time in SAS*/
/**/
/*The method differs slightly, depending on whether the date of the outcome event of interest is to be derived from the same */
/*file as the starting date (e.g. readmissions) or a separate file (e.g. linked file of deaths). */
/**/
/*The steps are:*/
/*1. Identify those patients with the diagnosis or procedure of interest. Create an index sequence variable starting from 1 */
/*for the index record with negative values for preceding records (if any). Create a working file containing all the records of these patients */
/*(including records before the index record).*/
/**/
/*2. Create an end date variable. First, a default date corresponding to the end of the study period is assigned to a fin_date variable */
/*for every record in the file. Then, for any patient whose survival time was censored before that date, either by being lost to follow-up */
/*(e.g. migrating), or dying (e.g. if outcome event is readmission), or dying of another cause (e.g. if outcome event is cause-specific mortality), */
/*the value of fin_date is changed to that person’s date of censoring. At this stage, all records now have a fin_date, which will be the final fin_date */
/*if the outcome event did not occur.*/
/**/
/*3. For patients who did experience the outcome event, create an event file containing only the person project number (ppn) and the outcome date. */
/*For deaths this will be an existing separate deaths file, while for other outcomes it is likely to be generated by identifying the first record for */
/*each patient that mentions the outcome event and outputting it to a separate file with a new variable name.*/
/**/
/*4. Merge the event file with original working data file, so that the outcome date is loaded onto every relevant record in the working file, */
/*matching by ppn and keeping only those records for patients in the original working data file (i.e. not those for any other people who experienced */
/*the outcome event, such as patients who died of the same cause but were not admitted to hospital during the study period).*/
/**/
/*5. Perform logic checks and delete cases that fail, e.g. where date of death precedes date of separation, or separation mode is death but */
/*there is no matching death date.*/
/**/
/*6. Create a status flag. The status flag is set to be 1 if an outcome event has occurred before the end of the study period and 0 otherwise. */
/*At the same time, set fin_date to be the event date for those with an event. Ignore events that occurred after the end of the study period, */
/*i.e. post-censoring events.*/
/**/
/*7. Calculate the time variable, survtime, as the difference (in days) between the starting date and fin_date. If survtime is 0 (e.g. */
/*they die on same day as admission), set survtime to be 1 (similarly to LOS); otherwise these patients will not be counted in the survival analysis.*/
/*We can create both the status variable and the survival time variable in a single DATA step, corresponding to steps 6 and 7 above:*/
/*Set up data for survival analysis*/
DATA survival;
SET merged;
/*Make status=0 for censored cases and post-censoring deaths*/
IF death_date = . OR death_date > fin_date THEN status = 0;
/*For subjects who died, fin_date is the date of death and the status is set to 1*/
ELSE DO;
fin_date = death_date;
status = 1;
END;
/*For all cases, create the survival time, and change 0 to 1*/
survtime = fin_date - episode_start_date;
IF survtime = 0 THEN survtime = 1;
RUN;
/*Kaplan-Meier method*/
PROC LIFETEST DATA=survival PLOTS=S(NOCENSOR)
TIMELIST=(0 to 5 by 0.5);
TIME survtime*status(0);
WHERE indexseq=1;
RUN;
PROC FORMAT;
VALUE $ICD
'X60'-'X64'='Poisoning by drugs'
'X65'-'X84'='Other intentional self-harm';
/*Logrank test*/
PROC LIFETEST DATA=survival PLOTS=S(NOCENSOR) NOTABLE;
TIME survtime*status(0); STRATA diagnosis_codeP / NODETAIL; FORMAT diagnosis_codeP $ICD.;
WHERE indexseq=1;
RUN;
/*Notice the use of the NOCENSOR survival-option above for the survival curves; this prevents SAS from showing each censored survival time with a + */
/*on the graph. While showing the censored points can be useful, it is not helpful for large datasets.*/
/**/
/*The first LIFETEST procedure above produces the survival curve and life table, and the TIMELIST option prints the survival estimates for times from 0 to 5 years*/
/*in steps of 0.5 years (assuming survtime has been converted from days to years).*/
/**/
/*The second LIFETEST procedure above uses the STRATA statement to perform a logrank test of the effect of the primary diagnosis, diagnosis_codeP, which has been */
/*divided into two groups using the $ICD. Format (see above). We suppress the printing of the life table this time, using the NOTABLE option. We also suppress some */
/*unnecessary output using the NODETAIL option of the STRATA statement. This time the PLOTS option will produce two survival curves, one for poisoning by drugs and */
/*one for all other methods.*/