Biostatistics: Exercise 5

Survival Analysis

The data sets are available for download in a .zip file.

  1. Data set Leucemia gives information on acute myelogenous leukemia survival, recording survival time, censoring status, and whether maintenance chemotherapy was given. Produce Kaplan-Meier plots of survival for the different treatment regimes. Is there a significant difference in survival?

  2. Data set Ovarian gives information on ovarian cancer survival (futime: survival or censoring time; fustat: censoring status), giving age (in years), presence of residual disease (resid.ds), treatment group (rx), and ECOG performance status (, where 1 is better) as possible risk factors. Fit a Cox proportional hazards model and plot the predicted survival curve.

  3. Data set Melanom contains data relating to survival of patients after operation for malignant melanoma collected at Odense University Hospital by K. T. Drzewiecki. The data provides information about the survival status (whether the patient died from malignant melanoma, survived or died from some other cause), survival time in days, potential ulceration and thickness (1/100 mm) of the tumor and the gender of the patient. Analyze the survival of the patients and how it depends on the explanatory variables.

Problems for written assignment

Please send the file with your results and your explanations and conclusions to The preferred format is pdf. The deadline for submission is September 30 2007.

  1. Data set WarpBreaks provides information on the number of breaks in yarn during weaving. It gives observations on the numbers of warp breaks per loom, where a loom corresponds to a fixed length of yarn, and the type of wool and the level of tension applied.

    • Inspect the data graphically. Try producing boxplots of the number of breaks for each of the 6 combinations of the factors wool and tension.

    • Is there a significant association between the two factors?

  2. Data set Cancer gives results from a case-control study of (o)esophageal cancer in Ile-et-Vilaine, France, providing information on age group and alcohol and tobacco consumption as possible risk factors. Try fitting an appropriate binomial model.

  3. Data set JobSatisfaction provides information on the job satisfaction of 715 blue collar workers, selected from Danish Industry in 1968. It includes information on the quality of management (bad, good), the supervisor’s job satisfaction (low, high) and the worker’s own job satisfaction (low, high).

    • Explore the relation between the three variables in the data set.

    • Use mosaicplot to analyze the data set graphically.