Skip to main content
Unit of study_

OLET5606: Data Wrangling

Data comes in many and varied formats, it can be tall or wide, big or small, structured or unstructured. Regardless of where you get your data from, it will almost always require some wrangling. Data wrangling is the convolution, alignment and preparation of data before use. This unit provides an overview of best practices in organising your research data from the point of discovery through to its use for scientific applications. You will learn the principles of data handling and how to maintain rigour and integrity of your data throughout your research, including documenting data provenance, how to access major databases, and data licensing. After calculating summary statistics to aid in the identification of outliers and missing values, you will learn how to clean and wrangle data in a reproducible manner in R, at a variety of scales. You will "wrangle" your research data using R, identifying outliers and missing values and ensuring provenance.

Code OLET5606
Academic unit Mathematics and Statistics Academic Operations
Credit points 2
Assumed knowledge:
Basic exploratory data analysis, basic coding in R

At the completion of this unit, you should be able to:

  • LO1. Describe the importance of data provenance, and major databases that can be used to mine data.
  • LO2. Define data licensing.
  • LO3. Calculate summary statistics to identify outliers and missing values.
  • LO4. Clean and wrangle data in a reproducible manner in R, at a variety of scales.