Biostatistics: Exercise 3

Generalized linear models

The data sets are available for download in a .zip file.

  1. Under simple Mendelian inheritance, the distribution of human genotypes for a diallelic marker system should be p^2^ : 2pq : q^2^, where p and q=1-p are the allele frequencies (Hardy-Weinberg equilibrium).
    • Construct a simple chi^2^ goodness-of-fit test for the null hypothesis of Hardy-Weinberg equilibrium where you use both methods suggested in the lecture:
      1. determine the ML estimate for p to obtain the expected values and calculate the chi^2^ statistic and
      2. minimize the chi^2^ statistic.

      Please note that you have to correct the degrees of freedom for the number of parameters estimated.

    • In a sample of schizophrenic patients, observed genotype counts for the Dopamine 3 receptor polymorphism were

      Genotype   A1A1   A1A2   A2A2
      Count      45     35     15

      Is there evidence for deviation from Hardy-Weinberg equilibrium in the underlying population?

  2. Data set tetrahymena contains data about the growth of tetrahymena cells: the diameter (μm) and concentration (counts/ml) of the cells and whether gloces was added to the growth medium or not. Find an appropriate model for the diameter of the cells explained by the other variables.
  3. Data set menarche contains information about the age at menarche in Warsaw female children, collected in 1965. Analyze the proportions of girls who have reached menarche using both logit and probit links.
  4. Data set coronary provides data about the association between the risk of coronary attack, age and smoking. For each combination of age group and smoking (yes/no) the number of deaths and the number of person-years at risk is given. How does the death rate depend on age and smoking?
  5. Data set malaria contains a random sample of 100 children, aged 3-15 years, from a village in Ghana. The children were followed for a period of 8 months. At the beginning of the study, values of a particular antibody were assessed. Based on observations during the study period, the children were categorized into two groups: individuals with and without symptoms of malaria. How does the probability of getting malaria depend on the other variables?
  6. On the Greek island of Kalythos the male inhabitants suffer from a congenital eye disease, the effects of which become more marked with increasing age. Samples of islander males of various ages were tested for blindness and the results recorded.

    Age:          20   35   45   55   70
    No. tested:   50   50   50   50   50
    No. blind:    6    17   26   37   44

    Using an logit or probit model estimate the LD50, that is, the age at which the probability of blindness is p=1/2, together with the standard error. Check how different the logit and probit models are in this respect.