Skip to main content

We are aiming for an incremental return to campus in accordance with guidelines provided by NSW Health and the Australian Government. Until this time, learning activities and assessments will be planned and scheduled for online delivery where possible, and unit-specific details about face-to-face teaching will be provided on Canvas as the opportunities for face-to-face learning become clear.

Unit of study_

COMP5046: Natural Language Processing

This unit introduces computational linguistics and the statistical techniques and algorithms used to automatically process natural languages (such as English or Chinese). It will review the core statistics and information theory, and the basic linguistics, required to understand statistical natural language processing (NLP). Statistical NLP is used in a wide range of applications, including information retrieval and extraction; question answering; machine translation; and classifying and clustering of documents. This unit will explore the key challenges of natural language to computational modelling, and the state of the art approaches to the key NLP sub-tasks, including tokenisation, morphological analysis, word sense representation, part-of-speech tagging, named entity recognition and other information extraction, text categorisation, phrase structure parsing and dependency parsing. You will implement many of these sub-tasks in labs and assignments. The unit will also investigate the annotation process that is central to creating training data for statistical NLP systems. You will annotate data as part of completing a real-world NLP task.

Code COMP5046
Academic unit Computer Science
Credit points 6
Assumed knowledge:
Knowledge of an OO programming language

At the completion of this unit, you should be able to:

  • LO1. apply basic linguistic knowledge to identifying the structure of language
  • LO2. have developed formal models to express natural language phenomenon
  • LO3. have developed machine learning and deep learning for solving natural language tasks
  • LO4. evaluate the performance of natural language processing systems
  • LO5. implement and debug large NLP systems in a clean and structured manner
  • LO6. apply machine learning/deep learning methods and information theory principles to modelling language.

Unit outlines

Unit outlines will be available 2 weeks before the first day of teaching for 1000-level and 5000-level units, or one week before the first day of teaching for all other units.

There are no unit outlines available online for previous years.