Unit of study_

# STAT5002: Introduction to Statistics

The aim of the unit is to introduce students to basic statistical concepts and methods for further studies. Particular attention will be paid to the development of methodologies related to statistical data analysis and Data Mining. A number of useful statistical models will be discussed and computer oriented estimation procedures will be developed. Smoothing and nonparametric concepts for the analysis of large data sets will also be discussed. Students will be exposed to the R computing language to handle all relevant computational aspects in the course.

Code STAT5002 Mathematics and Statistics Academic Operations 6
 Prerequisites: ? None None None HSC Mathematics

At the completion of this unit, you should be able to:

• LO1. use the R statistical computing environment to obtain numerical and graphical summaries of data, and for performing various statistical calculations
• LO2. explain univariate and bivariate data by means of the five number summary, mean, variance and standard deviation, correlation coefficient, boxplot, histogram and scatterplot
• LO3. use methods derived from the three axioms of probability to calculate the probabilities of simple events
• LO4. understand the concept of a random variable and the meaning of the expected value and variance
• LO5. apply the Binomial distribution as a model for discrete data
• LO6. use the Normal distribution as a model for continuous data
• LO7. understand the central limit theorem
• LO8. understand the concept of hypotheses tests and p-values for finding evidence for or against simple null hypotheses, in particular using the binomial test for testing proportions, one- or two-sided z-, t- or sign-test for making inference about the population mean
• LO9. use the Chi-squared test for simple goodness of fit problems
• LO10. understand the concept of a confidence interval
• LO11. understand the fundamental difference between frequentist based inference and bayesian inference
• LO12. find the least squares regression line as a way of describing a linear relationship in bivariate data
• LO13. use R to analyse multivariate data
• LO14. use of the general F-test as the main tool to choose between two nested regression models
• LO15. assess model assumptions and outlier detection in regression models through standard diagnostic plots (box plot, scatterplot, Q-Q-plot, Cook’s distance plot, leverage vs residual plot), through influence measures (leverage values, Cook’s distance)
• LO16. apply multiple linear regression and in the understanding of R2 and the adjusted R2
• LO17. calculate and interpret confidence intervals for all parameters in linear regression
• LO18. use model selection through using the F-test, t-test, AIC or BIC through full searches or by using step-wise procedures (backward, forward, stepwise)
• LO19. apply polynomial regression models
• LO20. apply logistic and non-parametric regression.

## Unit outlines

Unit outlines will be available 1 week before the first day of teaching for the relevant session.