During 2021 we will continue to support students who need to study remotely due to the ongoing impacts of COVID-19 and travel restrictions. Make sure you check the location code when selecting a unit outline or choosing your units of study in Sydney Student. Find out more about what these codes mean. Both remote and on-campus locations have the same learning activities and assessments, however teaching staff may vary. More information about face-to-face teaching and assessment arrangements for each unit will be provided on Canvas.
This course focuses on methods and techniques to efficiently explore and analyse large data collections. Where are hot spots of pedestrian accidents across a city? What are the most popular travel locations according to user postings on a travel website? The ability to combine and analyse data from various sources and from databases is essential for informed decision making in both research and industry. Students will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects, such as relational, semi-structured, time series, geospatial, image, text. As well as reinforcing their programming skills through experience with relevant Python libraries, this course will also introduce students to the concept of declarative data processing with SQL, and to analyse data in relational databases. Students will be given data sets from, eg. , social media, transport, health and social sciences, and be taught basic explorative data analysis and mining techniques in the context of small use cases. The course will further give students an understanding of the challenges involved with analysing large data volumes, such as the idea to partition and distribute data and computation among multiple computers for processing of 'Big Data'. This unit is an alternative to DATA2001, providing coverage of some additional, more sophisticated topics, suited for students with high academic achievement.
|Academic unit||Computer Science|
|75% or above from (DATA1002 OR DATA1902 OR INFO1110 OR INFO1903 OR INFO1103)|
At the completion of this unit, you should be able to:
Unit outlines will be available 2 weeks before the first day of teaching for the relevant session.