Skip to main content
Unit of study_

STAT3888: Statistical Machine Learning

Data Science is an emerging and inherently interdisciplinary field. A key set of skills in this area fall under the umbrella of Statistical Machine Learning methods. This unit presents the opportunity to bring together the concepts and skills you have learnt from a Statistics or Data Science major, and apply them to a joint project with NUTM3888 Metabolic Cybernetics where Statistics and Data Science students will form teams with Nutrition students to solve a real world problem using Statistical Machine Learning methods. The unit will cover a wide breadth of cutting edge supervised and unsupervised learning methods will be covered including principal component analysis, multivariate tests, discrimination analysis, Gaussian graphical models, log-linear models, classification trees, k-nearest neighbours, k-means clustering, hierarchical clustering, and logistic regression. In this unit, you will continue to understand and explore disciplinary knowledge, while also meeting and collaborating through project-based learning; identifying and solving problems, analysing data and communicating your findings to a diverse audience. All such skills are highly valued by employers. This unit will foster the ability to work in an interdisciplinary team, and this is essential for both professional and research pathways in the future.

Details

Academic unit Mathematics and Statistics Academic Operations
Unit code STAT3888
Unit name Statistical Machine Learning
Session, year
? 
Semester 2, 2022
Attendance mode Normal day
Location Remote
Credit points 6

Enrolment rules

Prohibitions
? 
STAT3914 or STAT3014
Prerequisites
? 
STAT2X11 and (DATA2X02 or STAT2X12)
Corequisites
? 
None
Assumed knowledge
? 

STAT3012 or STAT3912 or STAT3022 or STAT3922

Available to study abroad and exchange students

Yes

Teaching staff and contact details

Coordinator John Ormerod, john.ormerod@sydney.edu.au
Type Description Weight Due Length
Assignment Disciplinary Assignment
Exploratory data analysis of data set used in Major project.
10% Week 04
Due date: 26 Aug 2022 at 23:59

Closing date: 02 Sep 2022
4 weeks
Outcomes assessed: LO1 LO3 LO4 LO7 LO8
Presentation Project pitch
Present the group's pitch for the major project question.
10% Week 06
Due date: 08 Sep 2022 at 02:00

Closing date: 08 Sep 2022
5 min + 2min Q&A
Outcomes assessed: LO1 LO8 LO7 LO6 LO3 LO2
Tutorial quiz Disciplinary quiz
Quiz on lecture material
20% Week 09
Due date: 04 Oct 2022 at 14:00

Closing date: 04 Oct 2022
1 hour
Outcomes assessed: LO4 LO8 LO5
Presentation group assignment Major project - presentation
Group presentation of results (slides submitted at given due date)
15% Week 12
Due date: 27 Oct 2022 at 13:00

Closing date: 27 Oct 2022
5 minutes + 2 mins for questions
Outcomes assessed: LO1 LO8 LO7 LO6 LO5 LO4 LO3 LO2
Assignment group assignment Major project - Manuscript
Statistical analysis of nutrition data set
35% Week 13
Due date: 04 Nov 2022 at 23:59

Closing date: 11 Nov 2022
4000 words
Outcomes assessed: LO1 LO8 LO7 LO6 LO5 LO4 LO3 LO2
Assignment Major project - reflection/viva/minutes
Used to assess individual contributions
10% Week 13
Due date: 06 Nov 2022 at 23:59

Closing date: 06 Nov 2022
500 words/5 min
Outcomes assessed: LO1 LO2 LO3 LO4 LO5 LO6 LO7 LO8
group assignment = group assignment ?
  • Examination: This exam will test the learning outcomes attained in lectures, and tutorials/computer labs. University-approved non-programmable calculators may be used.
  • Computer lab reports: There are 2 computer lab reports, which must be submitted electronically in Turnitin, via the Learning Management System (Canvas) website, by the deadline. Note that a submission will not be marked if it is illegible, sideways or upside down. It is your responsibility to check your submission receipt (which will be automatically emailed to you) to ensure that your report has been submitted correctly.
  • Major project: The major project is broken up into several assessment items: major report, multimedia item, presentation, meeting minutes, peer-to-peer review, group work attendance, and short reflection.

Detailed information for each assessment can be found on Canvas.

Assessment criteria

The University awards common result grades, set out in the Coursework Policy 2014 (Schedule 1).

As a general guide, a high distinction indicates work of an exceptional standard, a distinction a very high standard, a credit a good standard, and a pass an acceptable standard.

For more information see sydney.edu.au/students/guide-to-grades.

Late submission

In accordance with University policy, these penalties apply when written work is submitted after 11:59pm on the due date:

  • Deduction of 5% of the maximum mark for each calendar day after the due date.
  • After ten calendar days late, a mark of zero will be awarded.

Special consideration

If you experience short-term circumstances beyond your control, such as illness, injury or misadventure or if you have essential commitments which impact your preparation or performance in an assessment, you may be eligible for special consideration or special arrangements.

Academic integrity

The Current Student website provides information on academic honesty, academic dishonesty, and the resources available to all students.

The University expects students and staff to act ethically and honestly and will treat all allegations of academic dishonesty or plagiarism seriously.

We use similarity detection software to detect potential instances of plagiarism or other forms of academic dishonesty. If such matches indicate evidence of plagiarism or other forms of dishonesty, your teacher is required to report your work for further investigation.

WK Topic Learning activity Learning outcomes
Week 01 Introduction, administration and motivation Lecture (1 hr)  
Data cleaning Lecture (1 hr)  
Unsupervised learning - Introduction to clustering Lecture (1 hr)  
Introduction to the project Workshop (1 hr) LO2 LO3 LO6
Week 02 Unsupervised learning - K-means Lecture (1 hr)  
Unsupervised learning - Model based clustering Lecture (1 hr)  
Unsupervised learning - Hierarchical clustering Lecture (1 hr)  
Tutorial/Lab - Clustering Tutorial (1 hr)  
Workshop on cultural competency Workshop (2 hr) LO6
Week 03 Unsupervised learning - PCA background Lecture (1 hr)  
Unsupervised learning - Principal component analysis Lecture (1 hr)  
Unsupervised learning - Dimension reduction Lecture (1 hr)  
Group meeting - Guidance on choosing a research topic Workshop (2 hr)  
Tutorial/Lab - Dimension reduction Tutorial (1 hr)  
Week 04 Supervised learning - Introduction to supervised learning Lecture (1 hr)  
Supervised learning - Logistic regression Lecture (1 hr)  
Supervised learning - Penalized Logistic regression Lecture (1 hr)  
Group work - Group formation Workshop (2 hr)  
Tutorial/Lab - Logistic regression Workshop (1 hr)  
Week 05 Supervised learning - Discrimination analysis Lecture (1 hr)  
Supervised learning - Regression and classification trees Lecture (1 hr)  
Supervised learning - Random forests Lecture (1 hr)  
Tutorial/Lab - Discrimination analysis and classification trees Tutorial (1 hr)  
Group work - Choosing a research topic for the group Workshop (2 hr)  
Week 06 Neural networks Lecture (1 hr) LO8
Neural networks Lecture (1 hr) LO8
Neural networks Tutorial (1 hr) LO8
Group work - Fianlising a research topic Workshop (2 hr)  
Week 07 Support vector machines Lecture (1 hr) LO8
Support vector machines Lecture (1 hr) LO8
Support vector machines Tutorial (1 hr) LO8
Assessment: Project pitch Workshop (2 hr)  
Week 08 How to write a manuscript Workshop (2 hr)  
Week 09 Quiz (worth 20%) Lecture (1 hr)  
Group work Workshop (2 hr)  
Week 10 Group work Workshop (2 hr)  
Week 11 Group work Workshop (2 hr)  
Week 12 Final presnetation Workshop (2 hr)  

Study commitment

Typically, there is a minimum expectation of 1.5-2 hours of student effort per week per credit point for units of study offered over a full semester. For a 6 credit point unit, this equates to roughly 120-150 hours of student effort in total.

Learning outcomes are what students know, understand and are able to do on completion of a unit of study. They are aligned with the University’s graduate qualities and are assessed as part of the curriculum.

At the completion of this unit, you should be able to:

  • LO1. apply disciplinary knowledge in statistics and data science to solve problems in an interdisciplinary context (nutrition)
  • LO2. find, define, and delimit authentic problems in order to address them
  • LO3. create an investigation strategy, explore solutions, discuss approaches, and predict outcomes
  • LO4. apply, formulate, interpret, and compare statistical machine learning methods including (wherever relevant) evaluation of model appropriateness
  • LO5. demonstrate integrity, confidence, personal resilience, and the capacity to manage challenges, both individually and in teams
  • LO6. collaborate with diverse groups across cultural and disciplinary boundaries to develop solution(s) to the project problems
  • LO7. communicate project outcomes effectively to a broad audience
  • LO8. identify appropriate machine learning problems to a particular problem, and judge the appropriateness of model evaluation procedures.

Graduate qualities

The graduate qualities are the qualities and skills that all University of Sydney graduates must demonstrate on successful completion of an award course. As a future Sydney graduate, the set of qualities have been designed to equip you for the contemporary world.

GQ1 Depth of disciplinary expertise

Deep disciplinary expertise is the ability to integrate and rigorously apply knowledge, understanding and skills of a recognised discipline defined by scholarly activity, as well as familiarity with evolving practice of the discipline.

GQ2 Critical thinking and problem solving

Critical thinking and problem solving are the questioning of ideas, evidence and assumptions in order to propose and evaluate hypotheses or alternative arguments before formulating a conclusion or a solution to an identified problem.

GQ3 Oral and written communication

Effective communication, in both oral and written form, is the clear exchange of meaning in a manner that is appropriate to audience and context.

GQ4 Information and digital literacy

Information and digital literacy is the ability to locate, interpret, evaluate, manage, adapt, integrate, create and convey information using appropriate resources, tools and strategies.

GQ5 Inventiveness

Generating novel ideas and solutions.

GQ6 Cultural competence

Cultural Competence is the ability to actively, ethically, respectfully, and successfully engage across and between cultures. In the Australian context, this includes and celebrates Aboriginal and Torres Strait Islander cultures, knowledge systems, and a mature understanding of contemporary issues.

GQ7 Interdisciplinary effectiveness

Interdisciplinary effectiveness is the integration and synthesis of multiple viewpoints and practices, working effectively across disciplinary boundaries.

GQ8 Integrated professional, ethical, and personal identity

An integrated professional, ethical and personal identity is understanding the interaction between one’s personal and professional selves in an ethical context.

GQ9 Influence

Engaging others in a process, idea or vision.

Outcome map

Learning outcomes Graduate qualities
GQ1 GQ2 GQ3 GQ4 GQ5 GQ6 GQ7 GQ8 GQ9
No changes have been made since this unit was last offered.

Where to go for help: For help with statistics, you can post a question on the Ed forum (anonymously from other students if you prefer), ask your tutor during a tutorial, computer lab (or workshop, for STAT3888), consult the lecturer in their consultation time (see above), or email john.ormerod@sydney. edu.au. For administrative questions, first check carefully whether the answers are on this information sheet or on the STAT3888 webpage; if not, ask on the ED forum or (if the question is specific to your situation) ask at the Student Services Office (Carslaw 520) or email STAT3888@sydney.edu.au or STAT3914@sydney.edu.au as appropriate. Ensure that any emails that you send contain your name and SID, because anonymous emails will be ignored. If your email includes questions that other students would benefit from seeing the answers to, you may be asked to post them on the Ed forum so that they can be answered there.

Disclaimer

The University reserves the right to amend units of study or no longer offer certain units, including where there are low enrolment numbers.

To help you understand common terms that we use at the University, we offer an online glossary.