Skip to main content
Unit of study_

PUBH5215: Analysis of Linked Health Data

Throughout our lives, information about our health and the care we receive is recorded and stored across various health-related databases, e.g., hospital admissions, ambulance service, cancer registry. Data linkage is a process that brings together information from different databases about the same individual, family, place or event. This process creates a chronological sequence of health events or individual 'health story' that can be combined into a much larger story about the health of people. This information can be used for research or to improve health services. This unit is suitable for health services researchers, policy makers, clinical practitioners, biostatisticians and data managers. We explain how data linkage is conducted, illustrate how data linkage can be used for research, highlighting the advantages, and the dangers and pitfalls. We describe how to design linked data studies, outline the data management steps required before analysing the data, and discuss some of the methods and issues of analysing linked data. Students will have access to data from a real data linkage and will gain hands-on experience to develop their programming skills for handling large complex dataset

Code PUBH5215
Academic unit Public Health
Credit points 6
(PUBH5010 or BSTA5011 or CEPI5100) and (PUBH5211 or PUBH5217 or BSTA5004)
Assumed knowledge:
The unit assumes introductory-level programming skills in SAS or R, assumes introductory-level knowledge in epidemiology, e.g., PUBH5010 or CEPI5100, and introductory-level knowledge in biostatistics or statistics, e.g., PUBH5018

At the completion of this unit, you should be able to:

  • LO1. understand the theory of data linkage methods and features of comprehensive data linkage systems, sufficient to know the sources and limitations of linked health data sets
  • LO2. apply epidemiological principles to the design of studies using linked data
  • LO3. construct numerators and denominators for the analysis of disease trends and health care utilisation and outcomes
  • LO4. assess the accuracy and reliability of data sources
  • LO5. check data linkages and assure the quality of the study process, e.g. consistency of definitions, missing data
  • LO6. list the issues to be considered when analysing large linked data files
  • LO7. write syntax to prepare linked data files for analysis, derive exposure and outcome variables, relate numerators and denominators, and produce results from statistical procedures.

Unit outlines

Unit outlines will be available 1 week before the first day of teaching for the relevant session.