Skip to main content
Unit of study_

BSTA5018: Machine Learning for Biostatistics (MLB)

2025 unit information

Recent years have brought a rapid growth in the amount and complexity of health data captured. Data collected in imaging, genomics, health registries, wearables, and among other applications call for new statistical techniques in both predictive and descriptive learning. Machine learning algorithms for classification and prediction complement classical statistical tools in the analysis of these data. This unit will cover modern machine learning methods particularly useful for large and complex health data. Topics include: linear regression and K-nearest neighbours; classification; bootstrapping and cross-validation resampling methods; model selection and regularization; non-linear approaches including splines and generalised additive models; and tree-based methods. The statistical software R will be used throughout the unit.

Unit details and rules

Managing faculty or University school:

Medicine and Health

Study level Postgraduate
Academic unit Public Health
Credit points 6
Prerequisites:
? 
(PUBH5010 or BSTA5011 or CEPI5100) and (BSTA5007 or BSTA5210 or BSTA5211 or PUBH5217)
Corequisites:
? 
None
Prohibitions:
? 
None
Assumed knowledge:
? 
None

At the completion of this unit, you should be able to:

  • LO1. Recognise situations where machine learning methods can offer advantages over traditional statistical modelling approaches to data analyses in health applications
  • LO2. Recognise and explain the differences between the goals of description and prediction
  • LO3. Determine and implement appropriate machine learning approaches for description and prediction in real-world health applications
  • LO4. Measure and explain the uncertainty of the results of analyses using machine learning approaches
  • LO5. Interpret the results of analyses using machine learning in light of the assumptions required, the quality of input data, and the sensitivity to the specific technique implemented
  • LO6. Critically appraise published papers concerning machine learning applications for classification or prediction in health
  • LO7. Effectively communicate results of analyses in language suitable for a clinical or epidemiological journal

Unit availability

This section lists the session, attendance modes and locations the unit is available in. There is a unit outline for each of the unit availabilities, which gives you information about the unit including assessment details and a schedule of weekly activities.

The outline is published 2 weeks before the first day of teaching. You can look at previous outlines for a guide to the details of a unit.

Session MoA ?  Location Outline ? 
Semester 2 2024
Online Camperdown/Darlington, Sydney
Session MoA ?  Location Outline ? 
Semester 2 2025
Online Camperdown/Darlington, Sydney
Outline unavailable
Session MoA ?  Location Outline ? 
Semester 2 Early 2020
Online Camperdown/Darlington, Sydney
Outline unavailable
Semester 2 2021
Online Camperdown/Darlington, Sydney
Semester 2 2022
Online Camperdown/Darlington, Sydney
Semester 2 2023
Online Camperdown/Darlington, Sydney

Find your current year census dates

Modes of attendance (MoA)

This refers to the Mode of attendance (MoA) for the unit as it appears when you’re selecting your units in Sydney Student. Find more information about modes of attendance on our website.