Skip to main content

Introductory analysis of linked data

Understand the theory and analysis of linked health datasets
This fully online short course is designed for health services researchers, clinical practitioners and managers, and introduces them to linked data analysis at an introductory to intermediate level.

Gain an understanding of the theory and skills needed to analyse linked health data. The modular structure of the course provides participants with a theoretical grounding on each theme, followed by a hands-on practical exercise in our computer lab each day, using de-identified linked NSW data files.

Enrolment is currently open for the November 2021 session - please refer below for study dates.

Course details

  • Introduction to data linkage and its history
  • Description of CHeReL and how record linkage works
  • Quality of data linkage
  • Ethics, data security, applying to CHeReL for data
  • Types of population health databases
  • ICD coding
  • Overview of linked data studies
  • Constructing study populations
  • SAS commands for arrays, merging datasets, tagging records, creating sequence variables
  • Measures of health care utilisation; health care episodes
  • Prevalent pool effect
  • Inter-hospital transfers
  • Data quality I: Preparing data for analysis
  • Data quality II: Accuracy and reliability of data sources
  • Measures of health care outcomes: treatment outcomes and adverse events
  • Introduction to survival analysis and Cox regression
  • Available covariates: sociodemographic, illness severity, comorbidity
  • Methods of risk adjustment

On completion of this short course participants will be able to:

  • understand the theory of data linkage methods and features of comprehensive data linkage systems, sufficient to know the sources and limitations of linked health data sets, and in particular those for NSW;
  • apply epidemiological principles to the design of studies using linked data;
  • construct numerators and denominators for the
  • analysis of disease trends and health care utilisation and outcomes;
  • assess the accuracy and reliability of data sources;
  • check data linkages and assure the quality of the study process, e.g. consistency of definitions, missing data;
  • list the issues to be considered when analysing large linked data files;
  • write syntax to prepare linked data files for analysis, derive exposure and outcome variables, relate numerators and denominators and produce results from statistical procedures.

The course is suitable for people with no previous experience in the analysis of linked health data. However, it does assume familiarity with introductory statistical and epidemiological methods, as taught, for example, in a Master of Public Health degree course.

The computing component of the unit also assumes a basic familiarity with computing syntax used in SAS and methods of basic statistical analysis of fixed-format data files. Participants must have this assumed knowledge.

Associate Professor Timothy Dobbins
National Drug and Alcohol Research Centre, University of NSW

Associate Professor Patrick Kelly
Sydney School of Public Health, University of Sydney

Ms Katie Irvine
Centre for Health Record Linkage (CHeReL)

Victoria Pye
Centre for Health Record Linkage (CHeReL)

Ms Sanja Lujic
Centre for Big Data Research in Health, University of NSW

Miss Filippa Pretty
Health Information Manager, University of Sydney

Dr Deborah Randall
Women and Babies Research, Kolling Institute

Dr Erin Cvejic
Sydney School of Public Health, University of Sydney

Associate Professor Siranda Torvaldse
Women and Babies Research, Kolling Institute

Associate Professor Heather Gidding
Women and Babies Research, Kolling Institute

Dr Ibinabo Ibiebele
Women and Babies Research, Kolling Institute

Important: This short course is a variant of the unit of study, Introductory Analysis of Linked Data (PUBH5215). It enables you to complete the unit without formal university enrolment. You will receive a certificate of completion, however, you will not receive credit points towards a University of Sydney degree.

To receive credit points and an academic transcript, please see the Medicine Postgraduate Non Award.

Key information
Course fees

Individuals/Small groups (up to 4 students)
$2970 incl. GST per student

Large groups (5+ students)
 $2640 incl. GST per student


The workshop will consist of pre-recorded lectures and live sessions. 

Attendance of live sessions is compulsory. 

Live sessions can be attended in person or via videoconferencing (Zoom).  

No formal assessments or examinations required.


Week 1: 15th to 17th November 2021

Week 2: 22nd to 23rd November 2021

Application deadline
Friday 22nd October



  • Education Support Office, Level 3, Edward Ford Building (A27), The University of Sydney, NSW 2006 Australia