Skip to main content
Unit of study_

COMP5310: Principles of Data Science

The focus of this unit is on understanding and applying relevant concepts, techniques, algorithms, and tools for the analysis, management and visualisation of data- with the goal of enabling discovery of information and knowledge to guide effective decision making and to gain new insights from large data sets. To this end, this unit of study provides a broad introduction to data management, analysis, modelling and visualisation using the Python programming language. Development of custom software using the powerful, general-purpose Python scripting language; Data collection, cleaning, pre-processing, and storage using various databases; Exploratory data analysis to understand and profile complex data sets; Mining unlabelled data to identify relationships, patterns, and trends; Machine learning from labelled data to predict into the future; Communicate findings to varied audiences, including effective data visualisations. Core data science content will be taught in normal lecture + tutorial delivery mode. Python programming will be taught through an online learning platform in addition to the weekly face-to-face lecture/tutorials. The unit of study will include hands-on exercises covering the range of data science skills above.

Code COMP5310
Academic unit Computer Science
Credit points 6
Assumed knowledge:
Good understanding of relational data model and database technologies as covered in ISYS2120 or COMP9120 (or equivalent UoS from different institutions).

At the completion of this unit, you should be able to:

  • LO1. select statistical techniques appropriate for evaluation of a predictive model that is based on data analysis, and justify this choice
  • LO2. select statistical techniques appropriate for summarisation and analysis of a data set, and justify this choice
  • LO3. apply concepts and terms from social science to describe and analyse the role of a data analysis task in its organisational context
  • LO4. understand the role of data science in decision-making
  • LO5. understand the technical issues that are present in the stages of a data analysis task and the properties of different technologies and tools that can be used to deal with the issues
  • LO6. process large data sets using appropriate technologies
  • LO7. carry out (in guided stages) the whole design and implementation cycle for creating a pipeline to analyse a large heterogenous dataset
  • LO8. seek details of how to use a method or tool in the data analytic process
  • LO9. communicate the results produced by an analysis pipeline, in oral and written form, including meaningful diagrams
  • LO10. communicate the process used to analyse a large data set, and justify the methods used.

Unit outlines

Unit outlines will be available 2 weeks before the first day of teaching for 1000-level and 5000-level units, or one week before the first day of teaching for all other units.

There are no unit outlines available online for previous years.