Skip to main content
Unit of study_

OCMP5339: Data Engineering

This unit of study covers the data engineering issues of building robust and scalable data processing pipelines. While data engineers may not be directly performing data analysis, they must have the technical knowledge and skillset to provide data analysts with appropriate data analytics architectures and to provide them with reliable and well-formed data that is ready to be analysed. Topics covered range from data ingestion from various sources including databases, text files and web services, to data cleaning and data transformation approaches, and the system architectures that allow the pipeline to run efficiently and automatically. Special consideration is given to building scalable data analysis solutions using a blend of Big Data processing techniques including data stream processing and distributed data processing platforms such as Apache Spark.

Code OCMP5339
Academic unit Computer Science
Credit points 6
COMP5310 or OCMP5310
COMP5329 or COMP4329
Assumed knowledge:
Proficiency in programming, especially Python, and in database querying with SQL; basic Unix scripting

The learning outcomes for this unit will be available two weeks before the first day of teaching.

Unit outlines

Unit outlines will be available 1 week before the first day of teaching for the relevant session.

There are no unit outlines available online for previous years.