Skip to main content
Unit of study_

OCMP5339: Data Engineering

2024 unit information

This unit of study covers the data engineering issues of building robust and scalable data processing pipelines. While data engineers may not be directly performing data analysis, they must have the technical knowledge and skillset to provide data analysts with appropriate data analytics architectures and to provide them with reliable and well-formed data that is ready to be analysed. Topics covered range from data ingestion from various sources including databases, text files and web services, to data cleaning and data transformation approaches, and the system architectures that allow the pipeline to run efficiently and automatically. Special consideration is given to building scalable data analysis solutions using a blend of Big Data processing techniques including data stream processing and distributed data processing platforms such as Apache Spark.

Unit details and rules

Managing faculty or University school:

Computer Science

Code OCMP5339
Academic unit Computer Science
Credit points 6
COMP5310 or OCMP5310
COMP5329 or COMP4329
Assumed knowledge:
Proficiency in programming, especially Python, and in database querying with SQL; basic Unix scripting

At the completion of this unit, you should be able to:

  • LO1. Use appropriate Python libraries to automate data engineering activities on diverse kinds of data
  • LO2. Use Unix command line to manage and automate data engineering activities
  • LO3. Ingest, combine and summarise data from a variety of data models
  • LO4. Demonstrate experience with handling datasets of diverse kinds of data, including relational, semi-structured, time series, geo-location, image, text
  • LO5. Understand the main challenges in data engineering: data volume, variety, velocity, veracity, robustness, security
  • LO6. Demonstrate awareness of ethical and privacy issues when working with data
  • LO7. Evaluate approaches that store and process data, for correctness, efficiency, and ease-of-use.

Unit availability

This section lists the session, attendance modes and locations the unit is available in. There is a unit outline for each of the unit availabilities, which gives you information about the unit including assessment details and a schedule of weekly activities.

The outline is published 2 weeks before the first day of teaching. You can look at previous outlines for a guide to the details of a unit.

Session MoA ?  Location Outline ? 
Semester 1b 2024
Online Online Program
Outline unavailable
Semester 2b 2024
Online Online Program
Outline unavailable
Session MoA ?  Location Outline ? 
Semester 2b 2023
Online Online Program

Modes of attendance (MoA)

This refers to the Mode of attendance (MoA) for the unit as it appears when you’re selecting your units in Sydney Student. Find more information about modes of attendance on our website.