University of Sydney Handbooks - 2020 Archive

Download full 2020 archivePage archived at: Tue, 27 Oct 2020

Data Science

DATA – Data Science unit of study descriptions

DATA1001 Foundations of Data Science

Credit points: 6 Teacher/Coordinator: Prof Qiying Wang Session: Semester 1,Semester 2 Classes: 3x1-hr lectures; 1x2-hr lab/wk Prohibitions: DATA1901 or MATH1005 or MATH1905 or MATH1015 or MATH1115 or ENVX1001 or ENVX1002 or ECMT1010 or BUSS1020 or STAT1021 Assessment: RQuizzes (10%); 3 x projects (30%); final exam (60%) Mode of delivery: Normal (lecture/lab/tutorial) day Faculty: Science
DATA1001 is a foundational unit in the Data Science major. The unit focuses on developing critical and statistical thinking skills for all students. Does mobile phone usage increase the incidence of brain tumours? What is the public's attitude to shark baiting following a fatal attack? Statistics is the science of decision making, essential in every industry and undergirds all research which relies on data. Students will use problems and data from the physical, health, life and social sciences to develop adaptive problem solving skills in a team setting. Taught interactively with embedded technology, DATA1001 develops critical thinking and skills to problem-solve with data. It is the prerequisite for DATA2002.
Textbooks
Statistics, (4th Edition), Freedman Pisani Purves (2007)
DATA1002 Informatics: Data and Computation

Credit points: 6 Teacher/Coordinator: Prof Alan Fekete Session: Semester 2 Classes: Lectures, Laboratories, Project Work - own time Prohibitions: INFO1903 OR DATA1902 Assessment: through semester assessment (50%), final exam (50%) Mode of delivery: Normal (lecture/lab/tutorial) day Faculty: Engineering
This unit covers computation and data handling, integrating sophisticated use of existing productivity software, e.g. spreadsheets, with the development of custom software using the general-purpose Python language. It will focus on skills directly applicable to data-driven decision-making. Students will see examples from many domains, and be able to write code to automate the common processes of data science, such as data ingestion, format conversion, cleaning, summarization, creation and application of a predictive model.
DATA1902 Informatics: Data and Computation (Advanced)

Credit points: 6 Teacher/Coordinator: Prof Alan Fekete Session: Semester 2 Classes: lectures, laboratories Prohibitions: INFO1903 OR DATA1002 Assumed knowledge: This unit is intended for students with ATAR at least sufficient for entry to the BSc/BAdvStudies(Advanced) stream, or for those who gained Distinction results or better, in some unit in Data Science, Mathematics, or Computer Science. Students with portfolio of high-quality relevant prior work can also be admitted. Assessment: through semester assessment (50%), final exam (50%) Mode of delivery: Normal (lecture/lab/tutorial) day Faculty: Engineering
Note: Department permission required for enrolment
This unit covers computation and data handling, integrating sophisticated use of existing productivity software, e. g. spreadsheets, with the development of custom software using the general-purpose Python language. It will focus on skills directly applicable to data-driven decision-making. Students will see examples from many domains, and be able to write code to automate the common processes of data science, such as data ingestion, format conversion, cleaning, summarization, creation and application of a predictive model. This unit includes the content of DATA1002, along with additional topics that are more sophisticated, suited for students with high academic achievement.
DATA2001 Data Science: Big Data and Data Diversity

Credit points: 6 Teacher/Coordinator: A/Prof Uwe Roehm Session: Semester 1 Classes: Lectures, Laboratories, Project Work - own time Prerequisites: DATA1002 OR DATA1902 OR INFO1110 OR INFO1910 OR INFO1903 OR INFO1103 Prohibitions: DATA2901 Assessment: through semester assessment (50%), final exam (50%) Mode of delivery: Normal (lecture/lab/tutorial) day Faculty: Engineering
This course focuses on methods and techniques to efficiently explore and analyse large data collections. Where are hot spots of pedestrian accidents across a city? What are the most popular travel locations according to user postings on a travel website? The ability to combine and analyse data from various sources and from databases is essential for informed decision making in both research and industry.
Students will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects, such as relational, semi-structured, time series, geospatial, image, text. As well as reinforcing their programming skills through experience with relevant Python libraries, this course will also introduce students to the concept of declarative data processing with SQL, and to analyse data in relational databases. Students will be given data sets from, eg. , social media, transport, health and social sciences, and be taught basic explorative data analysis and mining techniques in the context of small use cases. The course will further give students an understanding of the challenges involved with analysing large data volumes, such as the idea to partition and distribute data and computation among multiple computers for processing of 'Big Data'.
DATA2002 Data Analytics: Learning from Data

Credit points: 6 Teacher/Coordinator: A/Prof Jennifer Chan Session: Semester 2 Classes: Lecture 3 hrs/week + workshop 2 hr/week Prerequisites: [DATA1001 or ENVX1001 or ENVX1002] or [MATH10X5 and MATH1115] or [MATH10X5 and STAT2X11] or [MATH1905 and MATH1XXX (except MATH1XX5)] or [BUSS1020 or ECMT1010 or STAT1021] Prohibitions: STAT2012 or STAT2912 or DATA2902 Assumed knowledge: Basic linear algebra and some coding for example MATH1014 or MATH1002 or MATH1902 and DATA1001 or DATA1901 Assessment: Model reports (15%), online quizzes (15%), group work assignment and presentation (20%) and final exam (50%) Mode of delivery: Normal (lecture/lab/tutorial) day Faculty: Science
Technological advances in science, business and engineering have given rise to a proliferation of data from all aspects of our life. Understanding the information presented in these data is critical as it enables informed decision making into many areas including market intelligence and science. DATA2002 is an intermediate unit in statistics and data sciences, focusing on learning data analytic skills for a wide range of problems and data. How should the Australian government measure and report employment and unemployment? Can we tell the difference between decaffeinated and regular coffee ? In this unit, you will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects as well as reinforcing your programming skills through experience with a statistical programming language. You will also be exposed to the concept of statistical machine learning and develop the skill to analyse various types of data in order to answer a scientific question. From this unit, you will develop knowledge and skills that will enable you to embrace data analytic challenges stemming from everyday problems.
DATA2901 Big Data and Data Diversity (Advanced)

Credit points: 6 Teacher/Coordinator: A/Prof Uwe Roehm Session: Semester 1 Classes: lectures, laboratories Prerequisites: DATA1002 OR DATA1902 OR INFO1110 OR INFO1903 OR INFO1103. Students need Distinction or better in one of the prerequisite units. Prohibitions: DATA2001 Assessment: through semester assessment (60%), final exam (40%) Mode of delivery: Normal (lecture/lab/tutorial) day Faculty: Engineering
This course focuses on methods and techniques to efficiently explore and analyse large data collections. Where are hot spots of pedestrian accidents across a city? What are the most popular travel locations according to user postings on a travel website? The ability to combine and analyse data from various sources and from databases is essential for informed decision making in both research and industry. Students will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects, such as relational, semi-structured, time series, geospatial, image, text. As well as reinforcing their programming skills through experience with relevant Python libraries, this course will also introduce students to the concept of declarative data processing with SQL, and to analyse data in relational databases. Students will be given data sets from, eg. , social media, transport, health and social sciences, and be taught basic explorative data analysis and mining techniques in the context of small use cases. The course will further give students an understanding of the challenges involved with analysing large data volumes, such as the idea to partition and distribute data and computation among multiple computers for processing of 'Big Data'. This unit is an alternative to DATA2001, providing coverage of some additional, more sophisticated topics, suited for students with high academic achievement.
DATA3404 Data Science Platforms

Credit points: 6 Teacher/Coordinator: A/Prof Uwe Roehm Session: Semester 1 Classes: lectures, tutorials Prerequisites: DATA2001 OR DATA2901 OR ISYS2120 OR INFO2120 OR INFO2820 Prohibitions: INFO3504 OR INFO3404 Assumed knowledge: This unit of study assumes that students have previous knowledge of database structures and of SQL. The prerequisite material is covered in DATA2001 or ISYS2120. Familiarity with a programming language (e.g. Java or C) is also expected. Assessment: through semester assessment (40%), final exam (60%) Mode of delivery: Normal (lecture/lab/tutorial) day Faculty: Engineering
This unit of study provides a comprehensive overview of the internal mechanisms data science platforms and of the systems that manage large data collections. These skills are needed for successful performance tuning and to understand the scalability challenges faced by when processing Big Data. This unit builds upon the second' year DATA2001 - 'Data Science - Big Data and Data Diversity' and correspondingly assumes a sound understanding of SQL and data analysis tasks.
The first part of this subject focuses on mechanisms for large-scale data management. It provides a deep understanding of the internal components of a data management platform. Topics include: physical data organization and disk-based index structures, query processing and optimisation, and database tuning.
The second part focuses on the large-scale management of big data in a distributed architecture. Topics include: distributed and replicated databases, information retrieval, data stream processing, and web-scale data processing.
The unit will be of interest to students seeking an introduction to data management tuning, disk-based data structures and algorithms, and information retrieval. It will be valuable to those pursuing such careers as Software Engineers, Data Engineers, Database Administrators, and Big Data Platform specialists.
DATA3406 Human-in-the-Loop Data Analytics

Credit points: 6 Teacher/Coordinator: Prof Judith Kay Session: Semester 2 Classes: lectures, laboratories, project work Prerequisites: DATA2001 and DATA2002 Assumed knowledge: Basic statistics, database management, and programming. Assessment: through semester assessment (40%), final exam (60%) Mode of delivery: Normal (lecture/lab/tutorial) day Faculty: Engineering
This unit focuses on methods and techniques to take into consideration the human elements in data science. Humans can act as both sources of data and its interpreters, introducing a range of complexities with regards to analysis. How do we account for the unreliability in data collected from humans? What can be done to address the subjects' concerns about their data? How can we create visualisations that facilitate understanding of the main findings? What are the limitations of any predictions? The ability to consider human factors is essential in any loop that involves people gathering, storing, or interpreting data for decision making.
On completion of this unit, students will be able to identify and analyse the human factors in the data analytics loop, and will be able to derive solutions for the challenges that arise.
DATA5207 Data Analysis in the Social Sciences

Credit points: 6 Teacher/Coordinator: Shaun Ratcliff; Shaun Ratcliff Session: Intensive December,Semester 1 Classes: lectures, laboratories Assumed knowledge: COMP5310 Assessment: through semester assessment (100%) Mode of delivery: Normal (lecture/lab/tutorial) day, Normal (lecture/lab/tutorial) evening Faculty: Engineering
Note: Department permission required for enrolmentin the following sessions:Intensive December
Data science is a new, rapidly expanding field. There is an unprecedented demand from technology companies, financial services, government and not-for-profits for graduates who can effectively analyse data. This subject will help students gain a critical understanding of the strengths and weaknesses of quantitative research, and acquire practical skills using different methods and tools to answer relevant social science questions.
This subject will offer a nuanced combination of real-world applications to data science methodology, bringing an awareness of how to solve actual social problems to the Master of Data Science. We cover topics including elections, criminology, economics and the media. You will clean, process, model and make meaningful visualisations using data from these fields, and test hypotheses to draw inferences about the social world.
Techniques covered range from descriptive statistics and linear and logistic regression, the analysis of data from randomised experiments, model selection for prediction and classification tasks, to the analysis of unstructured text as data, multilevel and geospatial modelling, all using the open source program R. In doing this, not only will we build on the skills you have already mastered through this degree, but explore different ways to use them once you graduate.