Recent years have brought a rapid growth in the amount and complexity of data in biostatistical applications. Among others, data collected in imaging, genomic, health registries, wearables, call for new statistical techniques in both predictive and descriptive learning. Machine learning algorithms for classification and prediction, complement classical statistical tools in the analysis of these data. This unit will cover several modern methods particularly useful for big and complex data. Topics include classification trees, random forests, model selection, lasso, bootstrapping, cross-validation, generalised additive model, splines, among others. The statistical software R will be used throughout the unit.
the expected workload for this unit is 8-12 hours per week on average for 13 weeks, consisting of guided readings, discussion posts, independent study and completion of assessment tasks.
two major assignments worth 40% each (equivalent to 2 x 2000 words) and two short assignments worth 10% each.
James G, Witten D, Hastie T, Tibshirani R, An Introduction to Statistical Learning with Applications, in R. Springer, 2003. ISBN 978-1-4614-7138-7
If you have completed BSTA5007 you must take BSTA5018 and BSTA5008 at the same time.
BSTA5007 or PUBH5217