Current students

Units DATA2901

Unit of study_

DATA2901: Big Data and Data Diversity (Advanced)

2024 unit information

This course focuses on methods and techniques to efficiently explore and analyse large data collections. Where are hot spots of pedestrian accidents across a city? What are the most popular travel locations according to user postings on a travel website? The ability to combine and analyse data from various sources and from databases is essential for informed decision making in both research and industry. Students will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects, such as relational, semi-structured, time series, geospatial, image, text. As well as reinforcing their programming skills through experience with relevant Python libraries, this course will also introduce students to the concept of declarative data processing with SQL, and to analyse data in relational databases. Students will be given data sets from, eg. , social media, transport, health and social sciences, and be taught basic explorative data analysis and mining techniques in the context of small use cases. The course will further give students an understanding of the challenges involved with analysing large data volumes, such as the idea to partition and distribute data and computation among multiple computers for processing of 'Big Data'. This unit is an alternative to DATA2001, providing coverage of some additional, more sophisticated topics, suited for students with high academic achievement.

Unit details and rules

Managing faculty or University school:

Engineering

Details

Study level	Undergraduate
Academic unit	Computer Science
Credit points	6

Enrolment rules

Prerequisites: ?	75% or above from (DATA1002 or DATA1902 or INFO1110 or INFO1910 or INFO1903 or INFO1103 or ENGG1810)
Corequisites: ?	None
Prohibitions: ?	DATA2001
Assumed knowledge: ?	None

Learning outcomes

At the completion of this unit, you should be able to:

LO1. use appropriate Python libraries to automate data science activities on diverse kinds of data
LO2. ingest, combine and summarise data from a variety of data models
LO3. demonstrate experience with handling datasets of diverse kinds of data, including relational, semi-structured, time series, geo-location, image, text, including experience to combine data of different types
LO4. understand and produce declarative queries to extract appropriate information from data sets, including competence in use of SQL
LO5. understand the main challenges analysing 'big data': data volume, variety, velocity, veracity
LO6. understand the impact of data volume on data processing, and awareness of approaches to address this such as indexing, compression, data partitioning, and distributed processing frameworks (Hadoop)
LO7. demonstrate awareness of privacy issues when working with data
LO8. know and work with several sophisticated topics related to data scale and diversity.

Unit availability

This section lists the session, attendance modes and locations the unit is available in. There is a unit outline for each of the unit availabilities, which gives you information about the unit including assessment details and a schedule of weekly activities.

The outline is published 2 weeks before the first day of teaching. You can look at previous outlines for a guide to the details of a unit.

Current year
Previous years

Session	MoA ?	Location	Outline ?
Semester 1 2024	Normal day	Camperdown/Darlington, Sydney	View

Session	MoA ?	Location	Outline ?
Semester 1 2020	Normal day	Camperdown/Darlington, Sydney	View
Semester 1 2021	Normal day	Remote	View
Semester 1 2022	Normal day	Camperdown/Darlington, Sydney	View
Semester 1 2022	Normal day	Remote	View
Semester 1 2023	Normal day	Camperdown/Darlington, Sydney	View
Semester 1 2023	Normal day	Remote	View

Find your current year census dates

Modes of attendance (MoA)

This refers to the Mode of attendance (MoA) for the unit as it appears when you’re selecting your units in Sydney Student. Find more information about modes of attendance on our website.