Skip to main content
Unit of study_

DATA1901: Foundations of Data Science (Adv)

DATA1901 is an advanced level unit (matching DATA1001) that is foundational to the new major in Data Science. The unit focuses on developing critical and statistical thinking skills for all students. Does mobile phone usage increase the incidence of brain tumours? What is the public's attitude to shark baiting following a fatal attack? Statistics is the science of decision making, essential in every industry and undergirds all research that relies on data. Students will use problems and data from the physical, health, life and social sciences to develop adaptive problem solving skills in a team setting. Taught interactively with embedded technology and masterclasses, DATA1901 develops critical thinking and skills to problem-solve with data at an advanced level. By completing this unit you will have an excellent foundation for pursuing data science, whether directly through the data science major, or indirectly in whatever field you major in. The advanced unit has the same overall concepts as the regular unit but material is discussed in a manner that offers a greater level of challenge and academic rigour.

Code DATA1901
Academic unit Mathematics and Statistics Academic Operations
Credit points 6
MATH1005 or MATH1905 or ECMT1010 or ENVX1001 or ENVX1002 or BUSS1020 or DATA1001 or MATH1115 or MATH1015 or STAT1021
Assumed knowledge:
An ATAR of 95 or more

At the completion of this unit, you should be able to:

  • LO1. assess the importance of statistics in a data-rich world, including current challenges such as ethics, privacy and big data
  • LO2. analyse the study design behind a dataset, seeing additional evidence from literature, and evaluate how the study design affects context specific outcomes
  • LO3. design, produce, interpret and compare graphical and numerical summaries of data from multiple sources in R, using the use of interactive tools
  • LO4. apply the Normal approximation to data, with consideration of measurement error
  • LO5. model the relationship between 2 variables using linear regression, and explain linear regression in terms of projection
  • LO6. use the box model to describe chance and chance variability, including sample surveys and the central limit theorem
  • LO7. formulate an appropriate hypothesis and perform a range of hypothesis tests on given real multivariate data and a problem
  • LO8. interpret the p-value, conscious of the various pitfalls associated with testing
  • LO9. critique the use of statistics in media and research papers in a wide variety of data contexts, with attention to confounding and bias
  • LO10. perform data analysis in a team, on data requiring multiple preprocessing steps, and communicate the findings via oral and written reproducible reports, with extensive interrogation.