# Data Science

## DATA SCIENCE

## Data Science major

A major in Data Science requires 48 credit points from this table including:

(i) 6 credit points of 1000-level core units

(ii) 6 credit points of 1000-level units according to the following rules*:

(a) 6 credit points of selective units OR

(b) 3 credit points of statistics units and 3 credit points of computation units OR

(c) 3 credit points of advanced statistics units and 3 credit points of mathematics units OR

(d) 3 credit points of advanced statistics units and 3 credit points of linear algebra units for students in the Mathematical Sciences program^

(iii) 12 credit points of 2000-level core units

(iv) 6 credit points of 2000-level selective units

(v) 6 credit points of 3000-level core interdisciplinary project units

(vi) 6 credit points of 3000-level methodology units

(vii) 6 credit points of 3000-level methodology or application or interdisciplinary project selective units

*Students not enrolled in the BSc may substitute ECMT1010 or BUSS1020

^If elective space allows, students may substitute DATA1001/1901 for the advanced statistics unit

## Data Science minor

A minor in Data Science requires 36 credit points from this table including:

(i) 6 credit points of 1000-level core units

(ii) 6 credit points of 1000-level units according to the following rules*:

(a) 6 credit points of selective units OR

(b) 3 credit points of statistics units and 3 credit points of computations units OR

(c) 3 credit points of advanced statistics units and 3 credit points of calculus and linear algebra units

(iii) 12 credit points of 2000-level core units

(iv) 6 credit points of 2000-level selective units

(v) 6 credit points of 3000-level methodology units

### Units of study

The units of study are listed below.

#### 1000-level units of study

###### Core

**DATA1002 Informatics: Data and Computation**

Credit points: 6 Session: Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prohibitions: INFO1903 OR DATA1902 Assessment: Refer to the assessment table in the unit outline. Mode of delivery: Normal (lecture/lab/tutorial) day

This unit covers computation and data handling, integrating sophisticated use of existing productivity software, e.g. spreadsheets, with the development of custom software using the general-purpose Python language. It will focus on skills directly applicable to data-driven decision-making. Students will see examples from many domains, and be able to write code to automate the common processes of data science, such as data ingestion, format conversion, cleaning, summarization, creation and application of a predictive model.

**DATA1902 Informatics: Data and Computation (Advanced)**

Credit points: 6 Session: Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prohibitions: INFO1903 OR DATA1002 Assumed knowledge: This unit is intended for students with ATAR at least sufficient for entry to the BSc/BAdvStudies(Advanced) stream, or for those who gained Distinction results or better, in some unit in Data Science, Mathematics, or Computer Science. Students with portfolio of high-quality relevant prior work can also be admitted Assessment: Refer to the assessment table in the unit outline. Mode of delivery: Normal (lecture/lab/tutorial) day

Note: Department permission required for enrolment

This unit covers computation and data handling, integrating sophisticated use of existing productivity software, e. g. spreadsheets, with the development of custom software using the general-purpose Python language. It will focus on skills directly applicable to data-driven decision-making. Students will see examples from many domains, and be able to write code to automate the common processes of data science, such as data ingestion, format conversion, cleaning, summarization, creation and application of a predictive model. This unit includes the content of DATA1002, along with additional topics that are more sophisticated, suited for students with high academic achievement.

###### Selective

**DATA1001 Foundations of Data Science**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1,Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prohibitions: DATA1901 or MATH1005 or MATH1905 or MATH1015 or MATH1115 or ENVX1001 or ENVX1002 or ECMT1010 or BUSS1020 or STAT1021 Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

DATA1001 is a foundational unit in the Data Science major. The unit focuses on developing critical and statistical thinking skills for all students. Does mobile phone usage increase the incidence of brain tumours? What is the public's attitude to shark baiting following a fatal attack? Statistics is the science of decision making, essential in every industry and undergirds all research that relies on data. Students will use problems and data from the physical, health, life and social sciences to develop adaptive problem solving skills in a team setting. Taught interactively with embedded technology, DATA1001 develops critical thinking and skills to problem-solve with data. It is the prerequisite for DATA2002.

Textbooks

All learning material will be on Canvas. In addition, the optional textbook is Statistics by Freedman, Pisani and Purves (2007)

**DATA1901 Foundations of Data Science (Adv)**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1,Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prohibitions: MATH1005 or MATH1905 or ECMT1010 or ENVX1001 or ENVX1002 or BUSS1020 or DATA1001 or MATH1115 or MATH1015 or STAT1021 Assumed knowledge: An ATAR of 95 or more Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

DATA1901 is an advanced level unit (matching DATA1001) that is foundational to the new major in Data Science. The unit focuses on developing critical and statistical thinking skills for all students. Does mobile phone usage increase the incidence of brain tumours? What is the public's attitude to shark baiting following a fatal attack? Statistics is the science of decision making, essential in every industry and undergirds all research that relies on data. Students will use problems and data from the physical, health, life and social sciences to develop adaptive problem solving skills in a team setting. Taught interactively with embedded technology and masterclasses, DATA1901 develops critical thinking and skills to problem-solve with data at an advanced level. By completing this unit you will have an excellent foundation for pursuing data science, whether directly through the data science major, or indirectly in whatever field you major in. The advanced unit has the same overall concepts as the regular unit but material is discussed in a manner that offers a greater level of challenge and academic rigour.

Textbooks

All learning materials will be on Canvas. In addition, the optional textbook is Statistics by Freedman, Pisani, and Purves (2007).

**ENVX1002 Introduction to Statistical Methods**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prohibitions: ENVX1001 or MATH1005 or MATH1905 or MATH1015 or MATH1115 or DATA1001 or DATA1901 or BUSS1020 or STAT1021 or ECMT1010 Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

Note: Available as a degree core unit only in the Agriculture, Animal and Veterinary Bioscience, and Food and Agribusiness, and Taronga Wildlife Conservation streams

This is an introductory data science unit for students in the agricultural, life and environmental sciences. It provides the foundation for statistics and data science skills that are needed for a career in science and for further study in applied statistics and data science. The unit focuses on developing critical and statistical thinking skills for all students. It has 4 modules: exploring data, modelling data, sampling data and making decisions with data. Students will use problems and data from the physical, health, life and social sciences to develop adaptive problem-solving skills in a team setting. Taught interactively with embedded technology, ENVX1002 develops critical thinking and skills to problem-solve with data.

Textbooks

No textbooks are recommended but useful reference books are: Mead R, Curnow RN, Hasted AM (2002) 'Statistical methods in agriculture and experimental biology.' (Chapman and Hall: Boca Raton). Quinn GP, Keough MJ (2002) Experimental design and data analysis for Biologists. (Cambridge University Press)

###### Statistics

**MATH1005 Statistical Thinking with Data**

Credit points: 3 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Intensive January,Semester 1,Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prohibitions: MATH1015 or MATH1905 or STAT1021 or ECMT1010 or ENVX1001 or ENVX1002 or BUSS1020 or DATA1001 or DATA1901 Assumed knowledge: HSC Mathematics Advanced or equivalent Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

In a data-rich world, global citizens need to problem solve with data and evidence based decision-making is essential in every field of research and work. This unit equips you with the foundational statistical thinking to become a critical consumer of data. You will learn to think analytically about data and to evaluate the validity and accuracy of any conclusions drawn. Focusing on statistical literacy, the unit covers foundational statistical concepts, including the design of experiments, exploratory data analysis, sampling and tests of significance.

Textbooks

Statistics, (4th Edition), Freedman Pisani Purves (2007)

###### Computation

**MATH1115 Interrogating Data**

Credit points: 3 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Intensive January,Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: MATH1005 or MATH1015 Prohibitions: STAT1021 or ENVX1001 or ENVX1002 or BUSS1020 or ECMT1010 or DATA1001 or DATA1901 Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Block mode

In a data-rich world, global citizens need to problem solve with data, and evidence based decision-making is essential is every field of research and work. This unit equips you with foundational statistical thinking to interrogate data. Focusing on statistical literacy, the unit covers foundational statistical concepts such as visualising data, the linear regression model, and testing significance using the t and chi-square tests. Based on a flipped learning approach, you will experience most of your learning in weekly collaborative 2 hour labs, supplemented by readings and lectures. Working in teams, you will explore three real data stories across different domains, with associated literature. The combination of MATH1005 and MATH1115 is equivalent to DATA1001, allowing you to pathway to the Data Science, Statistics, or Quantitative Life Sciences majors.

Textbooks

All learning materials will be on Canvas. In addition, the optional textbook is Statistics by Freedman, Pisani and Purves (2007).

###### Advanced Statistics

**MATH1905 Statistical Thinking with Data (Advanced)**

Credit points: 3 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prohibitions: MATH1005 or MATH1015 or STAT1021 or ECMT1010 or ENVX1001 or ENVX1002 or BUSS1020 or DATA1001 or DATA1901 Assumed knowledge: HSC Mathematics Extension 2 or 90 or above in HSC Mathematics Extension 1 or equivalent Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

This unit is designed to provide a thorough preparation for further study in mathematics and statistics. It is a core unit of study providing three of the twelve credit points required by the Faculty of Science as well as a foundations requirement in the Faculty of Engineering. This Advanced level unit of study parallels the normal unit MATH1005 but goes more deeply into the subject matter and requires more mathematical sophistication.

Textbooks

Statistics (4th Edition), Freedman, Pisani, and Purves (2007)

###### Mathematics

**MATH1021 Calculus Of One Variable**

Credit points: 3 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Intensive January,Semester 1,Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prohibitions: MATH1901 or MATH1906 or ENVX1001 or MATH1001 or MATH1921 or MATH1931 Assumed knowledge: HSC Mathematics Extension 1 or equivalent Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

Calculus is a discipline of mathematics that finds profound applications in science, engineering, and economics. This unit investigates differential calculus and integral calculus of one variable and the diverse applications of this theory. Emphasis is given both to the theoretical and foundational aspects of the subject, as well as developing the valuable skill of applying the mathematical theory to solve practical problems. Topics covered in this unit of study include complex numbers, functions of a single variable, limits and continuity, differentiation, optimisation, Taylor polynomials, Taylor's Theorem, Taylor series, Riemann sums, and Riemann integrals.

Students are strongly recommended to complete MATH1021 or MATH1921 Calculus Of One Variable (Advanced) before commencing MATH1023 Multivariable Calculus and Modelling or MATH1923 Multivariable Calculus and Modelling (Adv).

Students are strongly recommended to complete MATH1021 or MATH1921 Calculus Of One Variable (Advanced) before commencing MATH1023 Multivariable Calculus and Modelling or MATH1923 Multivariable Calculus and Modelling (Adv).

Textbooks

Calculus of One Variable (Course Notes for MATH1021)

**MATH1921 Calculus Of One Variable (Advanced)**

Credit points: 3 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prohibitions: MATH1001 or MATH1906 or ENVX1001 or MATH1901 or MATH1021 or MATH1931 Assumed knowledge: (HSC Mathematics Extension 2) OR (Band E4 in HSC Mathematics Extension 1) or equivalent Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

Note: Department permission required for enrolment

Calculus is a discipline of mathematics that finds profound applications in science, engineering, and economics. This unit investigates differential calculus and integral calculus of one variable and the diverse applications of this theory. Emphasis is given both to the theoretical and foundational aspects of the subject, as well as developing the valuable skill of applying the mathematical theory to solve practical problems. Topics covered in this unit of study include complex numbers, functions of a single variable, limits and continuity, differentiation, optimisation, Taylor polynomials, Taylor's Theorem, Taylor series, Riemann sums, and Riemann integrals. Additional theoretical topics included in this advanced unit include the Intermediate Value Theorem, Rolle's Theorem, and the Mean Value Theorem. Students are strongly recommended to complete MATH1021 Calculus Of One Variable or MATH1921 Calculus Of One Variable (Advanced) before commencing MATH1023 Multivariable Calculus and Modelling or MATH1923 Multivariable Calculus and Modelling (Adv).

**MATH1931 Calculus Of One Variable (SSP)**

Credit points: 3 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1,Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prohibitions: MATH1001 or MATH1901 or ENVX1001 or MATH1906 or MATH1021 or MATH1921 Assumed knowledge: (HSC Mathematics Extension 2) OR (Band E4 in HSC Mathematics Extension 1) or equivalent Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

Note: Department permission required for enrolment

Note: Enrolment is by invitation only

The Mathematics Special Studies Program is for students with exceptional mathematical aptitude, and requires outstanding performance in past mathematical studies. Students will cover the material of MATH1921 Calculus of One Variable (Adv), and attend a weekly seminar covering special topics on available elsewhere in the Mathematics and Statistics program.

**MATH1023 Multivariable Calculus and Modelling**

Credit points: 3 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Intensive January,Semester 1,Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prohibitions: MATH1013 or MATH1903 or MATH1907 or MATH1003 or MATH1923 or MATH1933 Assumed knowledge: Knowledge of complex numbers and methods of differential and integral calculus including integration by partial fractions and integration by parts as for example in MATH1021 or MATH1921 or MATH1931 or HSC Mathematics Extension 2 Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

Calculus is a discipline of mathematics that finds profound applications in science, engineering, and economics. This unit investigates multivariable differential calculus and modelling. Emphasis is given both to the theoretical and foundational aspects of the subject, as well as developing the valuable skill of applying the mathematical theory to solve practical problems. Topics covered in this unit of study include mathematical modelling, first order differential equations, second order differential equations, systems of linear equations, visualisation in 2 and 3 dimensions, partial derivatives, directional derivatives, the gradient vector, and optimisation for functions of more than one variable.

Students are strongly recommended to complete MATH1021 or MATH1921 Calculus Of One Variable (Advanced) before commencing MATH1023 Multivariable Calculus and Modelling or MATH1923 Multivariable Calculus and Modelling (Adv).

Students are strongly recommended to complete MATH1021 or MATH1921 Calculus Of One Variable (Advanced) before commencing MATH1023 Multivariable Calculus and Modelling or MATH1923 Multivariable Calculus and Modelling (Adv).

Textbooks

Multivariable Calculus and Modelling (Course Notes for MATH1023)

**MATH1923 Multivariable Calculus and Modelling (Adv)**

Credit points: 3 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prohibitions: MATH1003 or MATH1013 or MATH1907 or MATH1903 or MATH1023 or MATH1933 Assumed knowledge: (HSC Mathematics Extension 2) OR (Band E4 in HSC Mathematics Extension 1) or equivalent Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

Note: Department permission required for enrolment

Calculus is a discipline of mathematics that finds profound applications in science, engineering, and economics. This unit investigates multivariable differential calculus and modelling. Emphasis is given both to the theoretical and foundational aspects of the subject, as well as developing the valuable skill of applying the mathematical theory to solve practical problems. Topics covered in this unit of study include mathematical modelling, first order differential equations, second order differential equations, systems of linear equations, visualisation in 2 and 3 dimensions, partial derivatives, directional derivatives, the gradient vector, and optimisation for functions of more than one variable. Additional topics covered in this advanced unit of study include the use of diagonalisation of matrices to study systems of linear equation and optimisation problems, limits of functions of two or more variables, and the derivative of a function of two or more variables. Students are strongly recommended to complete MATH1021 or MATH1921 Calculus Of One Variable (Advanced) before commencing MATH1023 Multivariable Calculus and Modelling or MATH1923 Multivariable Calculus and Modelling (Adv).

**MATH1933 Multivariable Calculus and Modelling (SSP)**

Credit points: 3 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prohibitions: MATH1003 or MATH1903 or MATH1013 or MATH1907 or MATH1023 or MATH1923 Assumed knowledge: (HSC Mathematics Extension 2) OR (Band E4 in HSC Mathematics Extension 1) or equivalent Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

Note: Department permission required for enrolment

Note: Enrolment is by invitation only.

The Mathematics Special Studies Program is for students with exceptional mathematical aptitude, and requires outstanding performance in past mathematical studies. Students will cover the material of MATH1923 Multivariable Calculus and Modelling (Adv), and attend a weekly seminar covering special topics on available elsewhere in the Mathematics and Statistics program.

**MATH1002 Linear Algebra**

Credit points: 3 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Intensive January,Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prohibitions: MATH1012 or MATH1014 or MATH1902 Assumed knowledge: HSC Mathematics or MATH1111. Students who have not completed HSC Mathematics (or equivalent) are strongly advised to take the Mathematics Bridging Course (offered in February) Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

MATH1002 is designed to provide a thorough preparation for further study in mathematics and statistics. It is a core unit of study providing three of the twelve credit points required by the Faculty of Science as well as a foundation requirement in the Faculty of Engineering.

This unit of study introduces vectors and vector algebra, linear algebra including solutions of linear systems, matrices, determinants, eigenvalues and eigenvectors.

This unit of study introduces vectors and vector algebra, linear algebra including solutions of linear systems, matrices, determinants, eigenvalues and eigenvectors.

Textbooks

Linear Algebra: A Modern Introduction, (4th edition), David Poole

**MATH1902 Linear Algebra (Advanced)**

Credit points: 3 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prohibitions: MATH1002 or MATH1014 Assumed knowledge: (HSC Mathematics Extension 2) OR (90 or above in HSC Mathematics Extension 1) or equivalent Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

Note: Department permission required for enrolment

This unit is designed to provide a thorough preparation for further study in mathematics and statistics. It is a core unit of study providing three of the twelve credit points required by the Faculty of Science as well as a foundations requirement in the Faculty of Engineering. It parallels the normal unit MATH1002 but goes more deeply into the subject matter and requires more mathematical sophistication.

#### 2000-level units of study

###### Core

**DATA2001 Data Science, Big Data and Data Variety**

Credit points: 6 Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: DATA1002 OR DATA1902 OR INFO1110 OR INFO1910 OR INFO1903 OR INFO1103 or ENGG1810 Prohibitions: DATA2901 Assessment: Refer to the assessment table in the unit outline. Mode of delivery: Normal (lecture/lab/tutorial) day

This course focuses on methods and techniques to efficiently explore and analyse large data collections. Where are hot spots of pedestrian accidents across a city? What are the most popular travel locations according to user postings on a travel website? The ability to combine and analyse data from various sources and from databases is essential for informed decision making in both research and industry.

Students will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects, such as relational, semi-structured, time series, geospatial, image, text. As well as reinforcing their programming skills through experience with relevant Python libraries, this course will also introduce students to the concept of declarative data processing with SQL, and to analyse data in relational databases. Students will be given data sets from, eg. , social media, transport, health and social sciences, and be taught basic explorative data analysis and mining techniques in the context of small use cases. The course will further give students an understanding of the challenges involved with analysing large data volumes, such as the idea to partition and distribute data and computation among multiple computers for processing of 'Big Data'.

Students will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects, such as relational, semi-structured, time series, geospatial, image, text. As well as reinforcing their programming skills through experience with relevant Python libraries, this course will also introduce students to the concept of declarative data processing with SQL, and to analyse data in relational databases. Students will be given data sets from, eg. , social media, transport, health and social sciences, and be taught basic explorative data analysis and mining techniques in the context of small use cases. The course will further give students an understanding of the challenges involved with analysing large data volumes, such as the idea to partition and distribute data and computation among multiple computers for processing of 'Big Data'.

**DATA2901 Big Data and Data Diversity (Advanced)**

Credit points: 6 Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: 75% or above from (DATA1002 OR DATA1902 OR INFO1110 OR INFO1903 OR INFO1103 or ENGG1810) Prohibitions: DATA2001 Assessment: Refer to the assessment table in the unit outline. Mode of delivery: Normal (lecture/lab/tutorial) day

This course focuses on methods and techniques to efficiently explore and analyse large data collections. Where are hot spots of pedestrian accidents across a city? What are the most popular travel locations according to user postings on a travel website? The ability to combine and analyse data from various sources and from databases is essential for informed decision making in both research and industry. Students will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects, such as relational, semi-structured, time series, geospatial, image, text. As well as reinforcing their programming skills through experience with relevant Python libraries, this course will also introduce students to the concept of declarative data processing with SQL, and to analyse data in relational databases. Students will be given data sets from, eg. , social media, transport, health and social sciences, and be taught basic explorative data analysis and mining techniques in the context of small use cases. The course will further give students an understanding of the challenges involved with analysing large data volumes, such as the idea to partition and distribute data and computation among multiple computers for processing of 'Big Data'. This unit is an alternative to DATA2001, providing coverage of some additional, more sophisticated topics, suited for students with high academic achievement.

**DATA2002 Data Analytics: Learning from Data**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: DATA1X01 or ENVX1002 or [MATH1X05 and MATH1XXX (excluding MATH1X05)] or BUSS1020 or ECMT1010 Prohibitions: STAT2012 or STAT2912 or DATA2902 Assumed knowledge: Successful completion of a first-year or second-year unit in statistics or data science including a substantial coding component. The content from STAT2X11 will help but is not considered essential. Students who are not comfortable using the R software for statistical analysis should familiarise themselves before attempting the unit, e.g. taking OLET1632: Shark Bites and Other Data Stories Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

Technological advances in science, business and engineering have given rise to a proliferation of data from all aspects of our life. Understanding the information presented in these data is critical as it enables informed decision making into many areas including market intelligence and science. DATA2002 is an intermediate unit in statistics and data sciences, focusing on learning data analytic skills for a wide range of problems and data. In this unit, you will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects as well as reinforce your programming skills through experience with a statistical programming language. You will also be exposed to the concept of statistical machine learning and develop the skills to analyse various types of data in order to answer a scientific question. From this unit, you will develop knowledge and skills that will enable you to embrace data analytic challenges stemming from everyday problems.

**DATA2902 Data Analytics: Learning from Data (Adv)**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: A mark of 65 or above in (DATA1X01 or ENVX1002 or [MATH1X05 and MATH1XXX (excluding MATH1X05)] or BUSS1020 or ECMT1010) Prohibitions: STAT2012 or STAT2912 or DATA2002 Assumed knowledge: Successful completion of a first-year or second-year unit in statistics or data science including a substantial coding component. The content from STAT2X11 will help but is not considered essential. Students who are not comfortable using the R software for statistical analysis should familiarise themselves before attempting the unit, e.g. taking OLET1632: Shark Bites and Other Data Stories Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

Technological advances in science, business, and engineering have given rise to a proliferation of data from all aspects of our life. Understanding the information presented in these data is critical as it enables informed decision making into many areas including market intelligence and science. DATA2902 is an intermediate unit in statistics and data sciences, focusing on learning advanced data analytic skills for a wide range of problems and data. In this unit, you will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects as well as reinforce your programming skills through experience with statistical programming language. You will also be exposed to the concept of statistical machine learning and develop the skills to analyse various types of data in order to answer a scientific question. From this unit, you will develop knowledge and skills that will enable you to embrace data analytic challenges stemming from everyday problems.

###### Selective

**COMP2123 Data Structures and Algorithms**

Credit points: 6 Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: INFO1110 OR INFO1910 OR INFO1113 OR DATA1002 OR DATA1902 OR INFO1103 OR INFO1903 Prohibitions: INFO1105 OR INFO1905 OR COMP2823 Assessment: Refer to the assessment table in the unit outline. Mode of delivery: Normal (lecture/lab/tutorial) day

This unit will teach some powerful ideas that are central to solving algorithmic problems in ways that are more efficient than naive approaches. In particular, students will learn how data collections can support efficient access, for example, how a dictionary or map can allow key-based lookup that does not slow down linearly as the collection grows in size. The data structures covered in this unit include lists, stacks, queues, priority queues, search trees, hash tables, and graphs. Students will also learn efficient techniques for classic tasks such as sorting a collection. The concept of asymptotic notation will be introduced, and used to describe the costs of various data access operations and algorithms.

**COMP2823 Data Structures and Algorithms (Adv)**

Credit points: 6 Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: Distinction level results in (INFO1110 OR INFO1910 OR INFO1113 OR DATA1002 OR DATA1902 OR INFO1103 OR INFO1903) Prohibitions: INFO1105 OR INFO1905 OR COMP2123 Assessment: Refer to the assessment table in the unit outline. Mode of delivery: Normal (lecture/lab/tutorial) day

This unit will teach some powerful ideas that are central to solving algorithmic problems in ways that are more efficient than naive approaches. In particular, students will learn how data collections can support efficient access, for example, how a dictionary or map can allow key-based lookup that does not slow down linearly as the collection grows in size. The data structures covered in this unit include lists, stacks, queues, priority queues, search trees, hash tables, and graphs. Students will also learn efficient techniques for classic tasks such as sorting a collection. The concept of asymptotic notation will be introduced, and used to describe the costs of various data access operations and algorithms.

**COSC2002 Computational Modelling**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prohibitions: COSC1003 or COSC1903 or COSC2902 Assumed knowledge: HSC Mathematics; DATA1002, or equivalent programming experience, ideally in Python Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

This unit will introduce a wide range of modelling and simulation techniques for tackling real-world problems using a computer. Data is often expensive to obtain, so by harnessing the enormous computational processing power now available to us we can answer what if questions based on data we already have. You will learn how to break a problem down into its key components, identifying necessary assumptions for the purposes of simulation. You will learn how to develop suitable metrics within computational models, to allow comparison of simulation data with real-world data. You will learn how to iteratively improve simulations as you validate them against real results, and you will gain experience in identifying the types of exploratory questions that computational modelling opens up. Programming will be in python. You will learn how to generate probabilistic data, solve systems of differential equations numerically, and tackle complex adaptive systems using agent-based models. Dynamical systems ranging from traffic flow to social segregation will be considered. By doing this unit you will develop the skills to go behind your data, understand why the data you observe might be as it is, and test scenarios which might otherwise be inaccessible.

**COSC2902 Computational Modelling (Advanced)**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: 48 credit points of 1000 level units with an average of 65 Prohibitions: COSC1003 or COSC1903 or COSC2002 Assumed knowledge: HSC Mathematics; DATA1002, or equivalent programming experience, ideally in Python Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

Note: Department permission required for enrolment

This unit will introduce a wide range of modelling and simulation techniques for tackling real-world problems using a computer. Data is often expensive to obtain, so by harnessing the enormous computational processing power now available to us we can answer what if questions based on data we already have. You will learn how to break a problem down into its key components, identifying necessary assumptions for the purposes of simulation. You will learn how to develop suitable metrics within computational models, to allow comparison of simulation data with real-world data. You will learn how to iteratively improve simulations as you validate them against real results, and you will gain experience in identifying the types of exploratory questions that computational modelling opens up. Programming will be in python. You will learn how to generate probabilistic data, solve systems of differential equations numerically, and tackle complex adaptive systems using agent-based models. Dynamical systems ranging from traffic flow to social segregation will be considered. By doing this unit you will develop the skills to go behind your data, understand why the data you observe might be as it is, and test scenarios which might otherwise be inaccessible. This is an advanced unit. It runs jointly with the associated mainstream unit, however the lab work and assessment requires a greater level of academic rigour. You will be required to engage in more challenging real-world computational modelling problems than the mainstream unit, and explore more deeply the reasons behind simulation results.

**GEGE2001 Genetics and Genomics**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prohibitions: GENE2002 or MBLG2972 or GEGE2901 or MBLG2072 Assumed knowledge: Mendelian genetics; mechanisms of evolution; molecular and chromosomal bases of inheritance; and gene regulation and expression Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

The era of genomics has revolutionised our approach to biology. Recent breakthroughs in genetics and genomic technologies have led to improvements in human and animal health, in breeding and selection of economically important organisms and in the curation and care of wild species and complex ecosystems. In this unit, students will investigate/describe ways in which modern biology uses genetics and genomics to study life, from the unicellular through to complex multicellular organisms and their interactions in communities and ecosystems. This unit includes a solid foundation in classical Mendelian genetics and its extensions into quantitative and population genetics. It also examines how our ability to sequence whole genomes has changed our capacities and our understanding of biology. Links between DNA, phenotype and the performance of organisms and ecosystems will be highlighted. The unit will examine the profound insights that modern molecular techniques have enabled in the fields of developmental biology, gene regulation, population genetics and molecular evolution.

**GEGE2901 Genetics and Genomics (Advanced)**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: Annual average mark of at least 70 Prohibitions: GENE2002 or MBLG2072 or GEGE2001 or MBLG2972 Assumed knowledge: Mendelian genetics, mechanisms of evolution, molecular and chromosomal bases of inheritance, and gene regulation and expression Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

The era of genomics has revolutionised our approach to biology. Recent breakthroughs in genetics and genomic technologies have led to improvements in human and animal health, in breeding and selection of economically important organisms and in the curation and care of wild species and complex ecosystems. In this unit, students will investigate/describe ways in which modern biology uses genetics and genomics to study life, from the unicellular through to complex multicellular organisms and their interactions in communities and ecosystems. This unit includes a solid foundation in classical Mendelian genetics and its extensions into quantitative and population genetics. It also examines how our ability to sequence whole genomes has changed our capacities and our understanding of biology. Links between DNA, phenotype and the performance of organisms and ecosystems will be highlighted. The unit will examine the profound insights that modern molecular techniques have enabled in the fields of developmental biology, gene regulation, population genetics and molecular evolution. The Advanced mode of Genetics and Genomics will provide you with challenge and a higher level of academic rigour. You will have the opportunity to plan a project that will develop your skills in contemporary genetics/molecular biology techniques and will provide you with a greater depth of disciplinary understanding. The Advanced mode will culminate in a written report and/or in an oral presentation where you will discuss a recent breakthrough that has been enabled by the use of modern genetics and genomics technologies. This is a unit for anyone wanting to better understand the how genetics has shaped the earth and how it will shape the future.

**QBIO2001 Molecular Systems Biology**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Assumed knowledge: Basic concepts in metabolism; protein synthesis; gene regulation; quantitative and statistical skills Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

Experimental approaches to the study of biological systems are shifting from hypothesis driven to hypothesis generating research. Large scale experiments at the molecular scale are producing enormous quantities of data ("Big Data") that need to be analysed to derive significant biological meaning. For example, monitoring the abundance of tens of thousands of proteins simultaneously promises ground-breaking discoveries. In this unit, you will develop specific analytical skills required to work with data obtained in the biological and medical sciences. The unit covers quantitative analysis of biological systems at the molecular scale including modelling and visualizing patterns using differential equations, experimental design and data types to understand disease aetiology. You will also use methods to model cellular systems including metabolism, gene regulation and signalling. The practical program will enable you to generate data analysis workflows, and gain a deep understanding of the statistical, informatics and modelling tools currently being used in the field. To leverage multiple types of expertise, the computer lab-based practical component of this unit will be predominantly a team-based collaborative learning environment. Upon completion of this unit, you will have gained skills to find meaningful solutions to difficult biological and disease-related problems with the potential to change our lives.

Textbooks

An Introduction to Systems Biology: Design Principles of Biological Circuits, Uri Alon, (Chapman and Hall/CRC, 2007). Systems Biology, Edda Klipp, Wolfram Liebermeister, Christoph Wierling, Axel Kowald, Hans Lehrach, and Ralf Herwig, (Wiley-Blackhall, 2009). Molecular biology of the cell, Alberts B et al (6th edition, Garland Science, 2015) Discovering Statistics Using R, Andy Field (2012, SAGE Publications Ltd). Computational and Statistical Methods for Protein Quantitation by Mass Spectrometry, Martens L et al (Wiley, 2013)

**STAT2011 Probability and Estimation Theory**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: (MATH1X21 or MATH1931 or MATH1X01 or MATH1906 or MATH1011) and (DATA1X01 or MATH10X5 or MATH1905 or STAT1021 or ECMT1010 or BUSS1020) Prohibitions: STAT2911 Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

This unit provides an introduction to probability, the concept of random variables, special distributions including the Binomial, Hypergeometric, Poisson, Normal, Geometric and Gamma and to statistical estimation. This unit will investigate univariate techniques in data analysis and for the most common statistical distributions that are used to model patterns of variability. You will learn the method of moments and maximum likelihood techniques for fitting statistical distributions to data. The unit will have weekly computer classes where you will learn to use a statistical computing package to perform simulations and carry out computer intensive estimation techniques like the bootstrap method. By doing this unit you will develop your statistical modelling skills and it will prepare you to learn more complicated statistical models.

Textbooks

An Introduction to Mathematical Statistics and Its Applications (5th edition), Chapters 1-5, Larsen and Marx (2012)

**STAT2911 Probability and Statistical Models (Adv)**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: (MATH1X21 or MATH1931 or MATH1X01 or MATH1906 or MATH1011) and a mark of 65 or greater in (DATA1X01 or MATH10X5 or MATH1905 or STAT1021 or ECMT1010 or BUSS1020) Prohibitions: STAT2011 Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

This unit is essentially an advanced version of STAT2011, with an emphasis on the mathematical techniques used to manipulate random variables and probability models. Common distributions including the Poisson, normal, beta and gamma families as well as the bivariate normal are introduced. Moment generating functions and convolution methods are used to understand the behaviour of sums of random variables. The method of moments and maximum likelihood techniques for fitting statistical distributions to data will be explored. The notions of conditional expectation and prediction will be covered as will be distributions related to the normal: chi^2, t and F. The unit has weekly computer classes where you will learn to use a statistical computing package to perform simulations and carry out computer intensive estimation techniques like the bootstrap method.

Textbooks

Mathematical Statistics and Data Analysis (3rd edition), J A Rice

#### 3000-level units of study

###### Core interdisciplinary project

**DATA3888 Data Science Capstone**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: DATA2001 or DATA2901 or DATA2002 or DATA2902 or STAT2912 or STAT2012 Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

In our ever-changing world, we are facing a new data-driven era where the capability to efficiently combine and analyse large data collections is essential for informed decision making in business and government, and for scientific research. Data science is an emerging interdisciplinary field with its focus on high performance computation and quantitative expression of the confidence in conclusions, and the clear communication of those conclusions in different discipline context. This unit is our capstone project that presents the opportunity to create a public data product that can illustrate the concepts and skills you have learnt in this discipline. In this unit, you will have an opportunity to explore deeper disciplinary knowledge; while also meeting and collaborating through project-based learning. The capstone project in this unit will allow you to identify and place the data-driven problem into an analytical framework, solve the problem through computational means, interpret the results and communicate your findings to a diverse audience. All such skills are highly valued by employers. This unit will foster the ability to work in an interdisciplinary team, to translate problem between two or more disciplines and this is essential for both professional and research pathways in the future.

###### Methodology

**COMP3027 Algorithm Design**

Credit points: 6 Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: COMP2123 OR COMP2823 OR INFO1105 OR INFO1905 Prohibitions: COMP2007 OR COMP2907 OR COMP3927 Assumed knowledge: MATH1004 OR MATH1904 OR MATH1064 Assessment: Refer to the assessment table in the unit outline. Mode of delivery: Normal (lecture/lab/tutorial) day

This unit provides an introduction to the design techniques that are used to find efficient algorithmic solutions for given problems. The techniques covered include greedy, divide-and-conquer, dynamic programming, and adjusting flows in networks. Students will extend their skills in algorithm analysis. The unit also provides an introduction to the concepts of computational complexity and reductions between problems.

**COMP3927 Algorithm Design (Adv)**

Credit points: 6 Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: Distinction level results in (COMP2123 OR COMP2823 OR INFO1105 OR INFO1905) Prohibitions: COMP2007 OR COMP2907 OR COMP3027 Assumed knowledge: MATH1004 OR MATH1904 OR MATH1064 Assessment: Refer to the assessment table in the unit outline. Mode of delivery: Normal (lecture/lab/tutorial) day

This unit provides an introduction to the design techniques that are used to find efficient algorithmic solutions for given problems. The techniques covered include greedy, divide-and-conquer, dynamic programming, and adjusting flows in networks. Students will extend their skills in algorithm analysis. The unit also provides an introduction to the concepts of computational complexity and reductions between problems.

**COMP3308 Introduction to Artificial Intelligence**

Credit points: 6 Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prohibitions: COMP3608 Assumed knowledge: Algorithms. Programming skills (e.g. Java, Python, C, C++, Matlab) Assessment: Refer to the assessment table in the unit outline. Mode of delivery: Normal (lecture/lab/tutorial) day

Artificial Intelligence (AI) is all about programming computers to perform tasks normally associated with intelligent behaviour. Classical AI programs have played games, proved theorems, discovered patterns in data, planned complex assembly sequences and so on. This unit of study will introduce representations, techniques and architectures used to build intelligent systems. It will explore selected topics such as heuristic search, game playing, machine learning, neural networks and probabilistic reasoning. Students who complete it will have an understanding of some of the fundamental methods and algorithms of AI, and an appreciation of how they can be applied to interesting problems. The unit will involve a practical component in which some simple problems are solved using AI techniques.

**COMP3608 Introduction to Artificial Intelligence (Adv)**

Credit points: 6 Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: Distinction-level results in at least one 2000 level COMP or MATH or SOFT unit Prohibitions: COMP3308 Assumed knowledge: Algorithms. Programming skills (e.g. Java, Python, C, C++, Matlab) Assessment: Refer to the assessment table in the unit outline. Mode of delivery: Normal (lecture/lab/tutorial) day

Note: COMP3308 and COMP3608 share the same lectures, but have different tutorials and assessment (the same type but more challenging).

An advanced alternative to COMP3308; covers material at an advanced and challenging level.

**DATA3404 Scalable Data Management**

Credit points: 6 Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: DATA2001 OR DATA2901 OR ISYS2120 OR INFO2120 OR INFO2820 Prohibitions: INFO3504 OR INFO3404 Assumed knowledge: This unit of study assumes that students have previous knowledge of database structures and of SQL. The prerequisite material is covered in DATA2001 or ISYS2120. Familiarity with a programming language (e.g. Java or C) is also expected Assessment: Refer to the assessment table in the unit outline. Mode of delivery: Normal (lecture/lab/tutorial) day

This unit of study provides a comprehensive overview of the internal mechanisms data science platforms and of the systems that manage large data collections. These skills are needed for successful performance tuning and to understand the scalability challenges faced by when processing Big Data. This unit builds upon the second' year DATA2001 - 'Data Science - Big Data and Data Diversity' and correspondingly assumes a sound understanding of SQL and data analysis tasks.

The first part of this subject focuses on mechanisms for large-scale data management. It provides a deep understanding of the internal components of a data management platform. Topics include: physical data organization and disk-based index structures, query processing and optimisation, and database tuning.

The second part focuses on the large-scale management of big data in a distributed architecture. Topics include: distributed and replicated databases, information retrieval, data stream processing, and web-scale data processing.

The unit will be of interest to students seeking an introduction to data management tuning, disk-based data structures and algorithms, and information retrieval. It will be valuable to those pursuing such careers as Software Engineers, Data Engineers, Database Administrators, and Big Data Platform specialists.

The first part of this subject focuses on mechanisms for large-scale data management. It provides a deep understanding of the internal components of a data management platform. Topics include: physical data organization and disk-based index structures, query processing and optimisation, and database tuning.

The second part focuses on the large-scale management of big data in a distributed architecture. Topics include: distributed and replicated databases, information retrieval, data stream processing, and web-scale data processing.

The unit will be of interest to students seeking an introduction to data management tuning, disk-based data structures and algorithms, and information retrieval. It will be valuable to those pursuing such careers as Software Engineers, Data Engineers, Database Administrators, and Big Data Platform specialists.

**DATA3406 Human-in-the-Loop Data Analytics**

Credit points: 6 Session: Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: (DATA2001 OR DATA2901) AND (DATA2002 OR DATA2902) Assumed knowledge: Basic statistics, database management, and programming Assessment: Refer to the assessment table in the unit outline. Mode of delivery: Normal (lecture/lab/tutorial) day

This unit focuses on methods and techniques to take into consideration the human elements in data science. Humans can act as both sources of data and its interpreters, introducing a range of complexities with regards to analysis. How do we account for the unreliability in data collected from humans? What can be done to address the subjects' concerns about their data? How can we create visualisations that facilitate understanding of the main findings? What are the limitations of any predictions? The ability to consider human factors is essential in any loop that involves people gathering, storing, or interpreting data for decision making.

On completion of this unit, students will be able to identify and analyse the human factors in the data analytics loop, and will be able to derive solutions for the challenges that arise.

On completion of this unit, students will be able to identify and analyse the human factors in the data analytics loop, and will be able to derive solutions for the challenges that arise.

**STAT3021 Stochastic Processes**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: STAT2X11 Prohibitions: STAT3911 or STAT3011 or STAT3921 or STAT4021 Assumed knowledge: Students are expected to have a thorough knowledge of basic probability and integral calculus Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

A stochastic process is a mathematical model of time-dependent random phenomena and is employed in numerous fields of application, including economics, finance, insurance, physics, biology, chemistry and computer science. This unit will establish basic properties of discrete-time Markov chains including random walks and branching processes. This unit will derive key results of Poisson processes and simple continuous-time Markov chains. This unit will investigate simple queuing theory. This unit will also introduce basic concepts of Brownian motion and martingales. Throughout the unit, various illustrative examples are provided in modelling and analysing problems of practical interest. By completing this unit, you will develop an essential basis for further studies stochastic analysis, stochastic differential equations, stochastic control, financial mathematics and statistical inference.

**STAT3921 Stochastic Processes (Advanced)**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: STAT2X11 Prohibitions: STAT3011 or STAT3911 or STAT3021 or STAT3003 or STAT3903 or STAT3005 or STAT3905 or STAT4021 Assumed knowledge: Students are expected to have a thorough knowledge of basic probability and integral calculus and to have achieved at credit level or above Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

A stochastic process is a mathematical model of time-dependent random phenomena and is employed in numerous fields of application, including economics, finance, insurance, physics, biology, chemistry and computer science. This unit will establish basic properties of discrete-time Markov chains including random walks and branching processes. This unit will derive key results of Poisson processes and simple continuous-time Markov chains. This unit will investigate simple queuing theory. This unit will also introduce basic concepts of Brownian motion and martingales. Throughout the unit, various illustrative examples are provided in modelling and analysing problems of practical interest. By completing this unit, you will develop a solid mathematical foundation of stochastic processes for further studies in advanced areas such as stochastic analysis, stochastic differential equations, stochastic control, financial mathematics and statistical inference. Students who undertake STAT3921/4021 will be expected to have a deeper, more sophisticated understanding of the theory and to be able to work with more complicated applications than students who complete the regular STAT3021 unit.

**STAT3022 Applied Linear Models**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: STAT2X11 and (DATA2X02 or STAT2X12) Prohibitions: STAT3912 or STAT3012 or STAT3922 or STAT4022 Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

Linear models are core to a wide range of real-world data analyses, for example in agriculture, health, sport and business. This unit provides an in-depth exploration of various linear models outlining when they can be applied, and how to assess if they are appropriate. The unit will introduce the fundamental concepts of analysis of data from both observational studies and experimental designs using classical linear methods, together with concepts of collection of data and design of experiments. You will consider linear models and robust regression methods with diagnostics for checking appropriateness of models and strategies for performing feature selection. You will learn to design and analyse experiments considering notions of replication, randomisation and ideas of factorial designs. You will apply, construct and interpret multi-way ANOVA models and make inferences, including post-hoc tests and making corrections for multiple comparisons. Throughout the unit you will use the R statistical package to perform analyses and generate statistical graphics. By completing this unit you will learn how to generate, interpret, visualise and critique linear models.

**STAT3922 Applied Linear Models (Advanced)**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: STAT2X11 and [a mark of 65 or greater in (STAT2X12 or DATA2X02)] Prohibitions: STAT3912 or STAT3012 or STAT3022 or STAT4022 Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

This unit will introduce the fundamental concepts of analysis of data from both observational studies and experimental designs using classical linear methods, together with concepts of collection of data and design of experiments. You will first consider linear models and regression methods with diagnostics for checking appropriateness of models, looking briefly at robust regression methods. Then you will consider the design and analysis of experiments considering notions of replication, randomisation and ideas of factorial designs. Throughout the course you will use the R statistical package to give analyses and graphical displays. This unit is essentially an Advanced version of STAT3012, with additional emphasis on the mathematical techniques underlying applied linear models together with proofs of distribution theory based on vector space methods.

**STAT3023 Statistical Inference**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: STAT2X11 Prohibitions: STAT3913 or STAT3013 or STAT3923 Assumed knowledge: DATA2X02 or STAT2X12 Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

In today's data-rich world more and more people from diverse fields are needing to perform statistical analyses and indeed more and more tools for doing so are becoming available; it is relatively easy to point and click and obtain some statistical analysis of your data. But how do you know if any particular analysis is indeed appropriate? Is there another procedure or workflow which would be more suitable? Is there such a thing as the best possible approach in a given situation? All of these questions (and more) are addressed in this unit. You will study the foundational core of modern statistical inference, including classical and cutting-edge theory and methods of mathematical statistics with a particular focus on various notions of optimality. The first part of the unit covers various aspects of distribution theory which are necessary for the second part which deals with optimal procedures in estimation and testing. The framework of statistical decision theory is used to unify many of the concepts. You will apply the methods learnt to real-world problems in laboratory sessions. By completing this unit you will develop the necessary skills to confidently choose the best statistical analysis to use in many situations.

**STAT3923 Statistical Inference (Advanced)**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: STAT2X11 and a mark of 65 or greater in (DATA2X02 or STAT2X12) Prohibitions: STAT3913 or STAT3013 or STAT3023 Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

In today's data-rich world more and more people from diverse fields are needing to perform statistical analyses and indeed more and more tools for doing so are becoming available; it is relatively easy to point and click and obtain some statistical analysis of your data. But how do you know if any particular analysis is indeed appropriate? Is there another procedure or workflow which would be more suitable? Is there such thing as a best possible approach in a given situation? All of these questions (and more) are addressed in this unit. You will study the foundational core of modern statistical inference, including classical and cutting-edge theory and methods of mathematical statistics with a particular focus on various notions of optimality. The first part of the unit covers various aspects of distribution theory which are necessary for the second part which deals with optimal procedures in estimation and testing. The framework of statistical decision theory is used to unify many of the concepts. You will rigorously prove key results and apply these to real-world problems in laboratory sessions. By completing this unit you will develop the necessary skills to confidently choose the best statistical analysis to use in many situations.

**STAT3925 Time Series (Advanced)**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: STAT2X11 and (MATH1X03 or MATH1907 or MATH1X23 or MATH1933) Prohibitions: STAT4025 Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

This unit will study basic concepts and methods of time series analysis applicable in many real world problems applicable in numerous fields, including economics, finance, insurance, physics, ecology, chemistry, computer science and engineering. This unit will investigate the basic methods of modelling and analyzing of time series data (ie. Data containing serially dependence structure). This can be achieved through learning standard time series procedures on identification of components, autocorrelations, partial autocorrelations and their sampling properties. After setting up these basics, students will learn the theory of stationary univariate time series models including ARMA, ARIMA and SARIMA and their properties. Then the identification, estimation, diagnostic model checking, decision making and forecasting methods based on these models will be developed with applications. The spectral theory of time series, estimation of spectra using periodogram and consistent estimation of spectra using lag-windows will be studied in detail. Further, the methods of analyzing long memory and time series and heteroscedastic time series models including ARCH, GARCH, ACD, SCD and SV models from financial econometrics and the analysis of vector ARIMA models will be developed with applications. By completing this unit, students will develop the essential basis for further studies, such as financial econometrics and financial time series. The skills gain through this unit of study will form a strong foundation to work in a financial industry or in a related research organization.

**STAT3926 Statistical Consulting (Advanced)**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: At least 12cp from STAT2X11 or STAT2X12 or DATA2X02 or STAT3XXX Prohibitions: STAT4026 Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

In our ever-changing world, we are facing a new data-driven era where the capability to efficiently combine and analyse large data collections is essential for informed decision making in business and government, and for scientific research. Statistics and data analytics consulting provide an important framework for many individuals to seek assistant with statistics and data-driven problems. This unit of study will provide students with an opportunity to gain real-life experience in statistical consulting or work with collaborative (interdisciplinary) research. In this unit, you will have an opportunity to have practical experience in a consultation setting with real clients. You will also apply your statistical knowledge in a diverse collection of consulting projects while learning project and time management skills. In this unit you will need to identify and place the client's problem into an analytical framework, provide a solution within a given time frame and communicate your findings back to the client. All such skills are highly valued by employers. This unit will foster the expertise needed to work in a statistical consulting firm or data analytical team which will be essential for data-driven professional and research pathways in the future.

###### Application

**ENVX3001 Environmental GIS**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: 6cp from (ENVI1003 or AGEN1002) or 6cp from GEOS1XXX or 6cp from BIOL1XXX or GEOS2X11 Prohibitions: GEOS3X14 Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

The critical role of geospatial science in major disturbance events, such as bushfires, coastal erosion and mapping spatio-temporal trends in NSW COVID-19 cases, has placed GIScience at the forefront of policy agendas, information sharing and community engagement. The disruptive nature of this field is clear and demand for expertise in GIS, Earth observation, spatial data analytics and location intelligence has grown. However, this expertise needs to be backed by an understanding of the science, conceptual principles and ethics that underpin these enabling technologies. We will incorporate the transformative potential of GIScience driven technologies demonstrated by disturbance events. This unit content will expose you to a breadth of analytical capabilities within GIS, various applications to complex environmental and coastal issues and ethical considerations in using and disseminating geographical information and knowledge. The fundamentals of GIS, spatial modelling and Earth observation will be introduced in the context of environmental and coastal management. You will build on these foundational concepts through problem-based learning in which GIS methods will be applied to address issues relating to fire and biodiversity, acid sulphate soils, coastal processes and water security. This unit is co-taught with GEOS3014/3914. GIScience, spatial reasoning and Earth observation in the context of environmental and coastal science and management is core to the learning objectives of both units.

Textbooks

Burrough, P.A. and McDonnell, R.A. 1998. Principles of Geographic Information Systems. Oxford University Press: Oxford.

**ENVX3002 Statistics in the Natural Sciences**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: ENVX2001 or STAT2X12 or BIOL2X22 or DATA2X02 or QBIO2001 Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

This unit of study is designed to introduce students to the analysis of data they may face in their future careers, in particular data that are not well behaved. The data may be non-normal, there may be missing observations, they may be correlated in space and time or too numerous to analyse with standard models. The unit is presented in an applied context with an emphasis on correctly analysing authentic datasets, and interpreting the output. It begins with the analysis and design of experiments based on the general linear model. In the second part, students will learn about the generalisation of the general linear model to accommodate non-normal data with a particular emphasis on the binomial and Poisson distributions. In the third part linear mixed models will be introduced which provide the means to analyse datasets that do not meet the assumptions of independent and equal errors, for example data that is correlated in space and time. The unit ends with an introduction to machine learning and predictive modelling. A key feature of the unit is using R to develop coding skills that are become essential in science for processing and analysing datasets of ever increasing size.

**AMED3002 Interrogating Biomedical and Health Data**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 1 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Assumed knowledge: Exploratory data analysis, sampling, simple linear regression, t-tests, confidence intervals and chi-squared goodness of fit tests, familiar with basic coding, basic linear algebra Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

Biotechnological advances have given rise to an explosion of original and shared public data relevant to human health. These data, including the monitoring of expression levels for thousands of genes and proteins simultaneously, together with multiple databases on biological systems, now promise exciting, ground-breaking discoveries in complex diseases. Critical to these discoveries will be our ability to unravel and extract information from these data. In this unit, you will develop analytical skills required to work with data obtained in the medical and diagnostic sciences. You will explore clinical data using powerful, state of the art methods and tools. Using real data sets, you will be guided in the application of modern data science techniques to interrogate, analyse and represent the data, both graphically and numerically. By analysing your own real data, as well as that from large public resources you will learn and apply the methods needed to find information on the relationship between genes and disease. Leveraging expertise from multiple sources by working in team-based collaborative learning environments, you will develop knowledge and skills that will enable you to play an active role in finding meaningful solutions to difficult problems, creating an important impact on our lives.

**GEGE3004 Applied Genomics**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: 6cp of (GEGE2X01 or QBIO2XXX or DATA2X01 or GENE2XXX or MBLG2X72 or ENVX2001 or DATA2X02) Prohibitions: ANSC3107 Assumed knowledge: Genetics at 2000 level, Biology at 1000 level, algebra Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

Note: This unit must be taken by all students in the Genetics and Genomics major.

The average mammalian genome is 3 billion nucleotides long and some other organisms have genomes that are even larger. Working with DNA at the nucleotide level on an organismal scale is impossible without the assistance of high performance computing. This unit will investigate strategies to manipulate genomic data on a whole organism scale. You will learn how scientists use high performance computing and web-based resources to compare and assemble genomes, map genes that cause specific phenotypes, and uncover mutations that cause phenotypic changes in organisms that influence health, external characteristics, production and disease. By doing this unit you will develop skills in the analysis of big data, you will gain familiarity with high performance computing worktop environments and learn to use bioinformatics tools that are commonly applied in research.

**BCMB3004 Beyond The Genome**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: 12 credit points from (AMED3001 or BCHM2X71 or BCHM2X72 or BCHM3XXX or BCMB2X01 or BCMB2X02 or BCMB3XXX or BIOL2X29 or BMED2401 or BMED2405 or GEGE2X01 or MBLG2X01 or MEDS2002 or MEDS2003 or PCOL2X21 or QBIO2001) Prohibitions: BCHM3X92 or BCMB3904 Assumed knowledge: Biochemistry, genetics, cell and/or molecular biology concepts at 2000-level units Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

The sequencing of the human genome was a landmark achievement in science and medicine, marking the 'Age of Genomics'. Now we can access the blueprints for life, but need to uncover how those blueprints work, allowing organisms to respond to internal and external environmental changes, and how we can utilise this plethora of DNA sequence information to improve human and planetary health. This unit will investigate the function of the genome by examining the proteome, metabolome and beyond. You will investigate links between the central dogma of molecular biology and the complexities of living genomes - from modifications that massively increase diversity to the dynamic metabolome. You will explore fundamental cellular processes and discover how they are shaped by the proteome via gene expression, post-translational modification and protein complex formation. These processes will be examined in the context of human health and cardiovascular and metabolic disorders (e. g. type 2 diabetes) to demonstrate how global approaches can define, diagnose and help develop treatments for disease. You will practice methods employed in the post-genome era, including the 'Multi-omics' approaches that provide a global view of living systems, and discover how they are applied to solve problems in biology, biomedicine and agriculture. By the end of the unit students will understand why global 'omics approaches are needed in the post-genome era and know how best to apply such tools to given biological and biomedical problems.

**BCMB3904 Beyond The Genome (Advanced)**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: An average mark of 75 or above in 12 credit points from (AMED3001 or BCHM2X71 or BCHM2X72 or BCHM3XXX or BCMB2X01 or BCMB2X02 or BCMB3XXX or BIOL2X29 or BMED2401 or BMED2405 or GEGE2X01 or MBLG2X01 or MEDS2002 or MEDS2003 or PCOL2X21 or QBIO2001) Prohibitions: BCHM3X92 or BCMB3004 Assumed knowledge: Biochemistry, genetics, cell and/or molecular biology concepts at 2000-level units Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

The sequencing of the human genome was a landmark achievement in science and medicine, marking the 'Age of Genomics'. Now we can access the blueprints for life, but need to uncover how those blueprints work, allowing organisms to respond to internal and external environmental changes, and how we can utilise this plethora of DNA sequence information to improve human and planetary health. This unit will investigate the function of the genome by examining the proteome, metabolome and beyond. You will investigate links between the central dogma of molecular biology and the complexities of living genomes - from modifications that massively increase diversity to the dynamic metabolome. You will explore fundamental cellular processes and discover how they are shaped by the proteome via gene expression, post-translational modification and protein complex formation. These processes will be examined in the context of human health and cardiovascular and metabolic disorders (e. g. type 2 diabetes) to demonstrate how global approaches can define, diagnose and help develop treatments for disease. You will practice methods employed in the post-genome era, including the 'Multi-omics' approaches that provide a global view of living systems, and discover how they are applied to solve problems in biology, biomedicine and agriculture. Beyond the Genome (Advanced) has the same overall structure as BCMB3004 but focuses on a more advanced level of practical work, data analysis and interpretation, using cutting-edge technologies. By the end of the unit students will understand why global 'omics approaches are needed in the post-genome era and know how best to apply such tools to given biological and biomedical problems.

###### Selective Interdisciplinary Project

**SCPU3001 Science Interdisciplinary Project**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Intensive February,Intensive July,Semester 1,Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: 96 credit points Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

This interdisciplinary unit provides students with the opportunity to address complex problems identified by industry, community, and government organisations, and gain valuable experience in working across disciplinary boundaries. In collaboration with a major industry partner and an academic lead, students integrate their academic skills and knowledge by working in teams with students from a range of disciplinary backgrounds. This experience allows students to research, analyse and present solutions to a real-world problem, and to build on their interpersonal and transferable skills by engaging with and learning from industry experts and presenting their ideas and solutions to the industry partner.

**STAT3888 Statistical Machine Learning**

Credit points: 6 Teacher/Coordinator: Refer to the unit of study outline https://www.sydney.edu.au/units Session: Semester 2 Classes: Refer to the unit of study outline https://www.sydney.edu.au/units Prerequisites: STAT2X11 and (DATA2X02 or STAT2X12) Prohibitions: STAT3914 or STAT3014 Assumed knowledge: STAT3012 or STAT3912 or STAT3022 or STAT3922 Assessment: Refer to the unit of study outline https://www.sydney.edu.au/units Mode of delivery: Normal (lecture/lab/tutorial) day

Data Science is an emerging and inherently interdisciplinary field. A key set of skills in this area fall under the umbrella of Statistical Machine Learning methods. This unit presents the opportunity to bring together the concepts and skills you have learnt from a Statistics or Data Science major, and apply them to a joint project with NUTM3888 Metabolic Cybernetics where Statistics and Data Science students will form teams with Nutrition students to solve a real world problem using Statistical Machine Learning methods. The unit will cover a wide breadth of cutting edge supervised and unsupervised learning methods will be covered including principal component analysis, multivariate tests, discrimination analysis, Gaussian graphical models, log-linear models, classification trees, k-nearest neighbours, k-means clustering, hierarchical clustering, and logistic regression. In this unit, you will continue to understand and explore disciplinary knowledge, while also meeting and collaborating through project-based learning; identifying and solving problems, analysing data and communicating your findings to a diverse audience. All such skills are highly valued by employers. This unit will foster the ability to work in an interdisciplinary team, and this is essential for both professional and research pathways in the future.