Home / Research / Facilities / Sydney Informatics Hub / Workshops and training

Workshops and training

Expand your data skills

We run a wide range of free introductory to advanced training courses spanning data science, statistics, bioinformatics, research computing, and research data management.

Many of our services are available free of charge to University researchers, research students, and affiliates. While workshops and training may be considered to customers external to the University, a fee-for-service arrangement may apply. Please contact us (sih.info@sydney.edu.au) for more information.

Topics

Statistics

Research Essentials: Analysing your data

In this workshop we provide a systematic workflow to apply to any research data analysis to make your quantitative work comprehensive, efficient and more suitable for top-tier journals.

We introduce you to the resources available from both the Sydney Informatics Hub and across the University that will support you in proceeding from hypothesis generation all the way through to publication. Our research workflow consists of a series of defined steps that will assist you in thinking about your data and preparing it for statistical analysis. Data analysis concepts will be covered in detail, including: how experimental design fits into hypothesis generation and your final publication; how to manage your analysis data and Exploratory Data Analysis (EDA) – an essential and often-overlooked stage of data analysis for determining the appropriate statistical methods to apply in your research. We will show you some of the more advanced statistical analysis methods to give you an idea of what is possible.

Note that this workshop does not require knowledge of or use of specific statistical software. The analysis methods may be performed using a range of university-supported software options.

Open to	University of Sydney staff, students and research affiliates
Prerequisites	No previous knowledge of statistical methods is required.
Resources	Workshop notes
Duration	90 minutes

Experimental design

In this workshop we focus on the key aspects of experimental design that researchers and students may need to apply in their research. Higher degree research students and researchers engaging in new research are especially invited to attend. During the workshop there will be the opportunity to discuss your own research question and associated experimental design.

The workshop will include the following topics:

• your research question
• experimental validity
• randomisation and bias
• blinding and bias
• blocking and confounding
• fixed and random effects
• replication, experimental units

Open to	University of Sydney staff, students and research affiliates
Prerequisites	No previous knowledge is assumed.
Resources	Workshop notes
Duration	90 minutes

Power and sample size calculation

In this workshop we will show you how power and sample size calculations will help you to determine the number of necessary subjects to include in your study, for completion of ethics and grant requirements, and ensure that you have thoroughly thought about your study design. This workshop covers the theory and concepts of power analysis and includes worked examples using web-based power calculators. 

Open to	University of Sydney staff, students and research affiliates.
Prerequisites	Knowledge of basic statistics is recommended.
Resources	Workshop notes
Duration	90 minutes

Linear Models 1: Linear regression, ANOVA, ANCOVA and repeated measures (a simple mixed model)

In this workshop we focus on practical data analysis by presenting statistical workflows appliable in any software for four of the most common univariate analyses: linear regression, ANOVA, ANCOVA, and repeated measures (a simple mixed model) – all assuming a normal (gaussian) residual. These workflows can be easily extended to more complex models. The R code used to create output is also included.

This is one of our three workshops for researchers interested in statistical methods such as linear regression, ANOVA, ANCOVA, mixed models, logistic/binary and count (Poisson) regression. Each one builds on the preceding workshop and together they show how all these analyses can be performed using the same easy to understand Generalised Linear Mixed Model (GLMM) framework and workflow. Additionally, how they can be used to analyse experimental designs such as Control vs Treatment, Randomised Control Trials (RCTs), Before After Control Impact (BACI) analysis, repeated measures, plus many more. There is also a fourth complementary workshop called Statistical Model Building which we recommend for those experienced with Linear Models or for those who have done at least the first two of our Linear Models workshops.

The material is organised around Statistical Workflows, applicable in any software, giving practical step-by-step instructions on how to do the analysis, including assumption testing, model interpretation, and presentation of results.

Open to	University of Sydney staff, students and research affiliates.
Prerequisites	Knowledge of basic statistics is recommended.
Resources	Workshop notes
Duration	90 minutes

Linear Models 2: Logistic and Poisson (count) regression-an introduction to Generalised Linear Models (GLM)

In this workshop we focus on practical data analysis applicable in any software for two of the more common GLMMs: Logistic regression for binary data (using a Binomial distribution); and Poisson/count regression for count data (using a Poisson distribution). The GLM framework is also described in detail. The R code used to create output is included.

Open to	University of Sydney staff, students and research affiliates.
Prerequisites	Knowledge of basic statistics is recommended.
Resources	Workshop notes
Duration	90 minutes

Linear Models 3: How to build interpretable models and analyse data to extract insightful & impactful patterns, and craft an engaging research story

Statistical analysis is more than just building the best predictive model, it should also enable you to make impactful discoveries that expand our knowledge. Constructing engaging narratives about your research is also invaluable as you look to connect with your field, the community and funding bodies. To do this you need to build interpretable models, test hypotheses, uncover insightful & impactful patterns, and present results in insightful, intuitive and memorable ways. In this workshop we explore tips and tricks to make your research do just that. Topics covered will be:

–Building impactful real-world recommendations and guidelines – i) why we need to understand both stated and model derived importance, ii) how Quadrant Analysis uses both variable performance and importance to develop impactful real-world recommendations and guidelines.

–Reporting tricks that extract insightful & impactful patterns and craft engaging stories – i) establishing the importance of a predictor/risk factor, ii) confidence vs prediction intervals, iii) applying and correcting for multiple comparisons, iv) testing different hypothesis using different model parameterisations of the design matrix, v) interpreting categorical predictors - dummy vs effects coding and estimated marginal means, plus other reporting and interpretation tricks.

–Building interpretable models – it’s quite common for researchers to incorrectly use model parameters to establish variables ‘impact’ or ‘importance’ . We show how multi-collinearity prevents this interpretation, and how to assess and then fix it so parameters can be used to identify important predictor/risk factors and other insightful patterns.

–Mixed models – extend the Linear Model 1 intro to: i) better explain how mixed models work, ii) use them to test population wide hypotheses outside your sampled groups, and iii) use a random slope (with examples of the patterns it can explain and hypotheses it can test).

–Using data visualisation to report complex nonlinear models graphically and aid pattern extraction

This is one of our three workshops for researchers interested in statistical methods such as linear regression, ANOVA, ANCOVA, mixed models, logistic (binary) and Poisson (count) regression. Each one builds on the preceding workshop showing how all these analyses can be performed using the same easy to understand Generalised Linear Mixed Model (GLMM) framework and workflow, and how they can be used to analyse experimental designs such as Control vs Treatment, Randomised Control Trials (RCTs), Before After Control Impact (BACI) analysis, repeated measures, plus many more. There is also a fourth complementary workshop called Statistical Model Building which we recommend for those experienced with linear models or for those who have done at least the first two of our Linear Model workshops.

Open to	University of Sydney staff, students and research affiliates.
Prerequisites	It is recommended that attendees are familiar with concepts of Linear Models explained in Linear Models 1 and 2 workshops.
Resources	Workshop notes
Duration	90 minutes

Statistical Model Building – an introduction

In this workshop we will introduce you to the key aspects and strategies of statistical model building to help you answer your research question, and avoid common pitfalls, erroneous models and incorrect conclusions. Appropriate statistical model building will help you to gain knowledge, as opposed to simply getting the best prediction (although that can be a goal as well).

We will focus on concepts such as variable selection, multi-collinearity, interactions, selecting a model building strategy, comparing models and evaluating models. In general, these concepts are useful for any statistical model building. This workshop will provide generalised linear regression model examples. The focus will be on practical application of concepts, so mathematical descriptions will be kept to a minimum.

Open to	University of Sydney staff, students and research affiliates
Prerequisites	Prior experience with statistical modelling is assumed, as the basics of regression modelling will not be covered. Please consider attending Linear Models 1 and/or Linear Models 2 workshops to come up to speed beforehand. Note that this workshop does not require knowledge of or use of specific statistics software.  The analysis methods may be performed using a wide range of commonly available software.
Resources	Workshop notes Bring your own laptop
Duration	90 minutes

Meta-analysis – an introduction

In this workshop we provide a theoretical and practical introduction to meta-analysis as part of a systematic review. We examine the process of performing a meta-analysis, in particular focusing on key statistical concepts such as heterogeneity and Fixed and Random effects modelling.

We will discuss the available choices of statistical software and show you worked examples using the metafor package in R. A basic knowledge of R software is desirable, but not necessary, since you are not expected to produce and run your own code during the workshop.

Open to	University of Sydney staff, students and research affiliates.
Prerequisites	Knowledge of basic statistics is recommended. Basic knowledge of R (programming language) is desirable but not required.
Resources	Workshop Notes Bring your own laptop. If you want to practice the example during the workshop you will need to have R and RStudio installed.
Duration	90 minutes

Introduction to Survival Analysis

Survival analysis is used when you want to measure the time elapsed up to when a specified event occurs. It is commonly used in studies where subjects are followed until death occurs, hence the name.

In this workshop we will introduce some key concepts pertaining to survival analysis, including censoring of cases, the survival function, and the hazard ratio estimator. The Kaplan Meier survival curve will be explained through a worked example and the technique of Cox proportional hazards regression will be introduced using the same example dataset.

You will be provided with software code in SPSS and R to reproduce the analysis presented in the workshop.

Open to:	University of Sydney staff, students and research affiliates.
Pre-requisites:	Knowledge of basic statistics is recommended.
Resources:	Workshop notes Bring your own laptop. If you want to practice the example during the workshop you will need to be able to run SPSS syntax or R code. This is optional.
Duration:	90 minutes

Design and Analysis of Surveys 1: Best Practice

In this workshop we present a range of practical tips and guidelines on how to design, field, and analyse the more commonly used surveys. The initial focus is on how to setup and field a study. A variety of different questions and scales, including some unorthodox and novel ones, will be presented to give an appreciation of what is possible. Some of the topics covered will be line vs discrete scales, the effect of colour, optimal discrete/LIKERT scales, etc.

Then we will present on basic analysis of common question types and reporting. We will discuss the pros and cons of common analyses (e.g. linear vs ordinal regression). The material is software agnostic and can be applied in any software.

Open to	University of Sydney staff,  students and research affiliates
Prerequisites	No previous knowledge of statistical methods is required.
Resources	Workshop notes
Duration	90 minutes

Design and Analysis of Surveys 2: Advanced topics

In this workshop we build on the information from Surveys 1. We explore topics including questionnaire validation and index creation using methods such as Exploratory Factor Analysis (EFA), Confirmatory Factor Analysis (CFA) using Structural Equation Modelling (SEM), and Conjoint models such as Choice modelling.

The material is software agnostic and can be applied in any software.

Open to	University of Sydney staff,  students and research affiliates
Prerequisites	No previous knowledge of statistical methods is required.
Resources	Workshop notes
Duration	90 minutes

Multivariate statistical analysis 1: Dimension reduction

In multivariate statistics we simultaneously model and estimate variability in more than one variable often in order to examine the relationship between variables. In this workshop we examine the key aspects of moving from univariate to multivariate analysis, and the situations and scenarios where multivariate analysis is typically applied. We will focus on practical application of concepts through examples.

Open to	University of Sydney staff, students and research affiliates
Prerequisites	Prior experience with statistical modelling is assumed, as the basics of regression modelling will not be covered. Please consider attending Linear Models 1 and/or Linear Models 2 workshops to come up to speed beforehand. Note that this workshop does not require knowledge of or use of specific statistics software.  The analysis methods may be performed using a wide range of commonly available software.
Resources	Workshop Notes
Duration	90 minutes

Programming

Research computing

Sydney Research Cloud: Onboarding to NCI Gadi HPC drop-in sessions

SIH is providing support and training for researchers onboarding to NCI Gadi HPC.

Next session: every Wednesday, 12-12:30pm AEST

Location: join virtually via Zoom

Open to: Staff, research students, and affiliates with a valid University of Sydney UniKey

Prerequisites: none

Resources: NCI for USyd user guide

Parallelisation with Python on HPC

This course is designed to transition researchers from local Python development and execution to tailor code for High Performance Computing (traditional and cloud) using specific libraries, functions and common implementations.

gain experience with best practices for structuring code and testing modular structure and workflows
learn about the libraries, data structures, and functions used for Python multiprocessing
explore commonly used codes to solve common problems such as deep learning, parallel computing, multi-threaded applications
utilise advanced libraries that outperform (in speed/ability to handle large data/design)

Open to

Staff, research students, and affiliates with a valid University of Sydney Unikey.

Prerequisites

Competency with high performance computing environments, submitting and running jobs, comfortable moving data between local and remote machines. Fundamental Python experience with basic grasp of functions, variables, syntax.

These prerequisites can be satisfied by attending these course regularly run on campus:

Resources

You must bring your own laptop. Contact us if you need to borrow one for the course. You must have a Python environment installed with the required modules. Please refer to the course notes for installation and versioning instructions.

NCI Gadi training

Pre-requisites: Varies. Assumed knowledge is basic Unix/Linux and HPC.

Researchers onboarding to Gadi HPC are strongly encouraged to attend NCI’s Getting started with Gadi seminar, run monthly.

NCI provides online training for topics including:

- Introduction to Unix shell

- Introduction to Python

- Introduction to parallel programming with Python

- Version control with Git

NCI also offer self-directed training through their online education platform, freely available to all NCI users.

See the NCI training calendar for dates and details.

Data science

Data Science Short Training

A series of short 1-1.5 hour online training sessions that showcase the tools and techniques we use internally at SIH:

Publication-ready tables in R
How fast is your R code: an introduction to code profiling and benchmarking
Writing better, tidier R: keep calm and code functionally with purrr
More in development

Introduction to machine learning in R

This is two day workshop series designed to provide an introduction to practical machine learning with R.

Day 1: Regression

Day 1 focuses on regression. We will provide an introduction to some basic principles of machine learning experimentation, describing how one selects a model to use, the concepts of cross-validation. We will demonstrate how these apply to several classical machine learning approaches in R, including supervised (classification and regression, such as K-nearest neighbour and linear regression) and unsupervised (clustering, such as hierarchical and k-means clustering, and dimensionality reduction, such as principal component analysis) methods. We recommend attending both the regression and classification workshops.

Day 2: Classification and unsupervised learning

Day 2 focuses on classification and unsupervised learning approaches. We will build on the first day’s activities to discuss how cross-validation applies in the context of classification problems. We will then demonstrate how these apply to several classical machine learning approaches in R, including supervised (classification and regression, such as K-nearest neighbour and linear regression) and unsupervised (clustering, such as hierarchical and k-means clustering, and dimensionality reduction, such as principal component analysis) methods. We recommend attending both the regression and classification workshops.

Open to	Staff, research students, and affiliates with a valid University of Sydney UniKey
Prerequisites	Attendees are expected to have some R background (at least at the level of the “Introductory R” Intersect courses, including the tidyverse suite of packages and the use of R as a data processing tool). It is assumed that attendees have not had previous training in ML, for example as part of an undergraduate semester-long course.
Resources	R, Rstudio and installation of several key packages will be required.

Introduction to machine learning in Python

This is a two day workshop series designed to provide an introduction to practical machine learning with python.

Day 1: Regression

Day 1 focuses on regression. We will provide an introduction to some basic principles of machine learning experimentation, describing how one selects a model to use, the concepts of cross-validation. We will demonstrate how these apply to several classical machine learning approaches in python, including supervised (classification and regression, such as K-nearest neighbour and linear regression) and unsupervised (clustering, such as hierarchical and k-means clustering, and dimensionality reduction, such as principal component analysis) methods. We recommend attending both the regression and classification workshops, as the latter builds on the former.

Day 2: Classification and Unsupervised Learning

Day 2 focuses on classification and unsupervised learning approaches. We will build on the first day’s activities to discuss how cross-validation applies in the context of classification problems. We will then demonstrate how these apply to several classical machine learning approaches in python, including supervised (classification and regression, such as K-nearest neighbour and linear regression) and unsupervised (clustering, such as hierarchical and k-means clustering, and dimensionality reduction, such as principal component analysis) methods. We recommend attending both the regression and classification workshops, as the latter builds on the former.

Open to	Staff, research students, and affiliates with a valid University of Sydney UniKey
Prequisites	Attendees are expected to have some python background (at least at the level of the “Introductory python” Intersect courses). It is assumed that attendees have not had previous training in ML, for example as part of an undergraduate semester-long course.
Resources	Anaconda python, jupyter notebooks and the scikit-learn library will be used in this course.

Data carpentry geospatial data analysis in R

This 2 day workshop follows the Data Carpentry R Geospatial curriculum, with additional details relating to working with geospatial data in Australia. It is designed to introduce learners comfortable with R to working with geospatial data, including raster and vector files. At the end of the workshops, learners will be able to load, manipulate and visualise these file types to make maps, and perform basic spatial calculations.

Open to	Staff, research students, and affiliates with a valid University of Sydney UniKey
Prerequisites	Attendees are expected to have some R background (at least at the level of the “Introductory R” Intersect courses, including the tidyverse suite of packages and the use of R as a data processing tool).
Resources	The lessons closely follow the Data Carpentry curriculum (opens new tab), and also include some Australian-specific information in “Introduction to Geospatial Concepts”.

Generative AI for Research

GenAI 101 Fundamentals

This foundational workshop recommended for all researchers covers the basics of what generative AI is, how the Large Language Models that power it work, the benefits and limitations of different available options, such as ChatGPT, Copilot, and local models, as well as the costs, both ethical and environmental of their training and use. This workshop then explains the University’s and other stakeholders' policies around the use of generative AI in research and the practical implications for everyday use.

Open to	Staff, research students, and affiliates with a valid University of Sydney UniKey
Duration	2 hours

GenAI 201 Prompting

This workshop introduces how to effectively prompt generative AI to get the best results. It discusses how AI is able to know things when answering, what the factors are influencing how much information to put in a prompt, and what prompt and context engineering are. It then goes through a range of strategies for prompting AI, how they work, what their strengths and weaknesses are, and how to incorporate them into your own research work.

Open to	Staff, research students, and affiliates with a valid University of Sydney UniKey
Prerequisites	We highly recommend learners attend GenAI 101 Fundamentals prior to this course. GenAI 201, 202 and 203 are best in order, but mixing and matching also works.
Duration	2 hours

GenAI 202 Reading and Writing

This workshop for researchers focuses on how to use AI to engage with academic literature, and how it can help with various writing tasks. It covers different ways to use AI to search for papers, and to understand what is in them, and how the relative policies and guardrails apply. It then also covers a range of different strategies to use AI for help with academic writing, their strengths and weaknesses, and the different kinds of tasks that they are useful and appropriate for.

Open to	Staff, research students, and affiliates with a valid University of Sydney UniKey
Prerequisites	We highly recommend learners attend GenAI 101 Fundamentals prior to this course. GenAI 201, 202 and 203 are best in order, but mixing and matching also works.
Duration	2 hours

GenAI 203 Tools

This workshop covers what AI tools are currently available to help with research. It covers a range of different tools, from generic options available from major AI companies, to specialised academic AI services designed to help researchers in various ways, to highly specialised AIs that are being applied in various areas of research and academia. It discusses how these tools work, what the focus and strengths of each one are, as well as how the relative policies and guardrails apply.

Open to	Staff, research students, and affiliates with a valid University of Sydney UniKey
Prerequisites	We highly recommend learners attend GenAI 101 Fundamentals prior to this course. GenAI 201, 202 and 203 are best in order, but mixing and matching also works.
Duration	2 hours

Bioinformatics

Australian BioCommons training cooperative

SIH is a member of the national bioinformatics training cooperative. Through the cooperative, you can attend and access free online webinars and workshops delivered by institutions around Australia.

You can find all bioinformatics training materials on our training materials site, as well as the Australian BioCommons training registry, and YouTube channel. Upcoming events are listed on the Australian BioCommons events calendar.

Research data management

Introduction to REDCap

Research Electronic Data Capture (REDCap) is a secure web-based database application maintained by the University. It is ideal for collecting and managing participant data and administering online surveys, with features supporting longitudinal data collection, complex team workflows and exports to a range of statistical analysis programs.

We will cover:

how to build a simple data entry project
how to choose the appropriate fields for your data collection
how to invite collaborators to a project

Open to

University of Sydney staff, students and affiliates

This training session is designed to address the needs of the University’s research community. It includes information specific to the University’s research data systems and platforms.

Resources

Bring your own laptop

Related courses

This training covers basic functions of REDCap. Many additional features are covered in Surveys in REDCap.

Surveys in REDCap

In this training session, we will cover:

how to set up a survey
how to flow participants through surveys
how to distribute surveys

Open to

University of Sydney staff, students and affiliates

This training session is designed to address the needs of the University’s research community. It includes information specific to the University’s research data systems and platforms.

Prerequisites

Experience in building REDCap projects using the basic functions

Resources

You must bring your own laptop.