University of Sydney Handbooks - 2018 Archive

Download full 2018 archive Page archived at: Fri, 21 Sep 2018 05:39:46 +0000

Data Science unit of study descriptions

DATA – Data Science unit of study descriptions

DATA1001 Foundations of Data Science

Credit points: 6 Teacher/Coordinator: Dr Di Warren Session: Semester 1,Semester 2 Classes: lecture 3 hrs/week; computer tutorial 2 hr/week Prohibitions: MATH1005 or MATH1905 or MATH1015 or MATH1115 or ENVX1001 or ENVX1002 or ECMT1010 or BUSS1020 or STAT1021 Assessment: assignments, quizzes, presentation, exam Mode of delivery: Normal (lecture/lab/tutorial) day Faculty: Science
DATA1001 is a foundational unit in the Data Science major. The unit focuses on developing critical and statistical thinking skills for all students. Does mobile phone usage increase the incidence of brain tumours? What is the public's attitude to shark baiting following a fatal attack? Statistics is the science of decision making, essential in every industry and undergirds all research which relies on data. Students will use problems and data from the physical, health, life and social sciences to develop adaptive problem solving skills in a team setting. Taught interactively with embedded technology, DATA1001 develops critical thinking and skills to problem-solve with data. It is the prerequisite for DATA2002.
Textbooks
Statistics, Fourth Edition, Freedman Pisani Purves
DATA1002 Informatics: Data and Computation

Credit points: 6 Session: Semester 2 Classes: Lectures, Laboratories, Project Work - own time Prohibitions: INFO1903 Assessment: through semester assessment (50%), final exam (50%) Mode of delivery: Normal (lecture/lab/tutorial) day Faculty: Engineering and Information Technologies
This unit covers computation and data handling, integrating sophisticated use of existing productivity software, e.g. spreadsheets, with the development of custom software using the general-purpose Python language. It will focus on skills directly applicable to data-driven decision-making. Students will see examples from many domains, and be able to write code to automate the common processes of data science, such as data ingestion, format conversion, cleaning, summarization, creation and application of a predictive model.
DATA2001 Data Science: Big Data and Data Diversity

Credit points: 6 Session: Semester 1 Classes: Lectures, Laboratories, Project Work - own time Prerequisites: DATA1002 OR INFO1110 OR INFO1903 OR INFO1103 Assessment: through semester assessment (50%), final exam (50%) Mode of delivery: Normal (lecture/lab/tutorial) day Faculty: Engineering and Information Technologies
This course focuses on methods and techniques to efficiently explore and analyse large data collections. Where are hot spots of pedestrian accidents across a city? What are the most popular travel locations according to user postings on a travel website? The ability to combine and analyse data from various sources and from databases is essential for informed decision making in both research and industry.
Students will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects, such as relational, semi-structured, time series, geospatial, image, text. As well as reinforcing their programming skills through experience with relevant Python libraries, this course will also introduce students to the concept of declarative data processing with SQL, and to analyse data in relational databases. Students will be given data sets from, eg. , social media, transport, health and social sciences, and be taught basic explorative data analysis and mining techniques in the context of small use cases. The course will further give students an understanding of the challenges involved with analysing large data volumes, such as the idea to partition and distribute data and computation among multiple computers for processing of 'Big Data'.
DATA2002 Data Analytics: Learning from Data

Credit points: 6 Teacher/Coordinator: Jean Yang Session: Semester 2 Classes: lecture 3 hrs/week; computer tutorial 2 hr/week Prerequisites: [DATA1001 or ENVX1001 or ENVX1002] or [MATH10X5 and MATH1115] or [MATH10X5 and STAT2011] or [MATH1905 and MATH1XXX (except MATH1XX5)] or [BUSS1020 or ECMT1010 or STAT1021] Prohibitions: STAT2012 or STAT2912 Assumed knowledge: (Basic Linear Algebra and some coding) or QBUS1040 Assessment: written assignment, presentation, exams Mode of delivery: Normal (lecture/lab/tutorial) day Faculty: Science
Technological advances in science, business, engineering has given rise to a proliferation of data from all aspects of our life. Understanding the information presented in these data is critical as it enables informed decision making into many areas including market intelligence and science. DATA2002 is an intermediate course in statistics and data sciences, focusing on learning data analytic skills for a wide range of problems and data. How should the Australian government measure and report employment and unemployment? Can we tell the difference between decaffeinated and regular coffee ? In this course, you will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects as well as reinforcing their programming skills through experience with statistical programming language. You will also be exposed to the concept of statistical machine learning and develop the skill to analyze various types of data in order to answer a scientific question. From this unit, you will develop knowledge and skills that will enable you to embrace data analytic challenges stemming from everyday problems.
DATA3404 Data Science Platforms

Credit points: 6 Session: Semester 1 Classes: lectures, tutorials Prerequisites: DATA2001 OR ISYS2120 OR INFO2120 OR INFO2820 Prohibitions: INFO3504 OR INFO3404 Assumed knowledge: This unit of study assumes that students have previous knowledge of database structures and of SQL. The prerequisite material is covered in DATA2001 or ISYS2120. Familiarity with a programming language (e.g. Java or C) is also expected. Assessment: through semester assessment (40%), final exam (60%) Mode of delivery: Normal (lecture/lab/tutorial) day Faculty: Engineering and Information Technologies
This unit of study provides a comprehensive overview of the internal mechanisms data science platforms and of systems that manage large data collections. These skills are needed for successful performance tuning and to understand the scalability challenges faced by when processing Big Data. This unit builds upon the second' year DATA2001 - 'Data Science - Big Data and Data Diversity' and correspondingly assumes a sound understanding of SQL and data analysis tasks.
The first part of this subject focuses on mechanisms for large-scale data management. It provides a deep understanding of the internal components of a data management platform. Topics include: physical data organization and disk-based index structures, query processing and optimisation, and database tuning.
The second part focuses on the large-scale management of big data in a distributed architecture. Topics include: distributed and replicated databases, information retrieval, data stream processing, and web-scale data processing.
The unit will be of interest to students seeking an introduction to data management tuning, disk-based data structures and algorithms, and information retrieval. It will be valuable to those pursuing such careers as Software Engineers, Data Engineers, Database Administrators, and Big Data Platform specialists.
DATA5207 Data Analysis in the Social Sciences

Credit points: 6 Session: Semester 1 Classes: lectures, laboratories Assumed knowledge: COMP5310 Assessment: through semester assessment (100%) Mode of delivery: Normal (lecture/lab/tutorial) day Faculty: Engineering and Information Technologies
Data science is a new, rapidly expanding field. There is an unprecedented demand from technology companies, financial services, government and not-for-profits for graduates who can effectively analyse data. This subject will help students gain a critical understanding of the strengths and weaknesses of quantitative research, and acquire practical skills using different methods and tools to answer relevant social science questions.
This subject will offer a nuanced combination of real-world applications to data science methodology, bringing an awareness of how to solve actual social problems to the Master of Data Science. We cover topics including elections, criminology, economics and the media. You will clean, process, model and make meaningful visualisations using data from these fields, and test hypotheses to draw inferences about the social world.
Techniques covered range from descriptive statistics and linear and logistic regression, the analysis of data from randomised experiments, model selection for prediction and classification tasks, to the analysis of unstructured text as data, multilevel and geospatial modelling, all using the open source program R. In doing this, not only will we build on the skills you have already mastered through this degree, but explore different ways to use them once you graduate.