Skip to main content
Unit of study_

COMP4446: Natural Language Processing

2024 unit information

This unit introduces computational linguistics and the statistical techniques and algorithms used to automatically process natural languages (such as English or Chinese). It will review the core statistics and information theory, and the basic linguistics, required to understand statistical natural language processing (NLP). Statistical NLP is used in a wide range of applications, including information retrieval and extraction; question answering; machine translation; and classifying and clustering of documents. This unit will explore the key challenges of natural language to computational modelling, and the state of the art approaches to the key NLP sub-tasks, including tokenisation, morphological analysis, word sense representation, part-of-speech tagging, named entity recognition and other information extraction, text categorisation, phrase structure parsing and dependency parsing. You will implement many of these sub-tasks in labs and assignments. The unit will also investigate the annotation process that is central to creating training data for statistical NLP systems. You will annotate data as part of completing a real-world NLP task.

Unit details and rules

Managing faculty or University school:

Computer Science

Code COMP4446
Academic unit Computer Science
Credit points 6
Enrolment in a thesis unit. INFO4001 or INFO4911 or INFO4991 or INFO4992 or AMME4111 or BMET4111 or CHNG4811 or CIVL4022 or ELEC4712 or COMP4103 or SOFT4103 or DATA4103 or ISYS4103
Assumed knowledge:
Knowledge of an OO programming language as covered in INFO1113

At the completion of this unit, you should be able to:

  • LO1. apply basic linguistic knowledge to identify the structure of language
  • LO2. have developed formal models to express natural language phenomenon
  • LO3. have developed machine learning and deep learning for solving natural language tasks
  • LO4. evaluate the performance of natural language processing systems
  • LO5. implement and debug large NLP systems in a clean and structured manner
  • LO6. apply machine learning/deep learning methods and information theory principles to modelling language.

Unit availability

This section lists the session, attendance modes and locations the unit is available in. There is a unit outline for each of the unit availabilities, which gives you information about the unit including assessment details and a schedule of weekly activities.

The outline is published 2 weeks before the first day of teaching. You can look at previous outlines for a guide to the details of a unit.

Session MoA ?  Location Outline ? 
Semester 1 2024
Normal day Camperdown/Darlington, Sydney
Session MoA ?  Location Outline ? 
Semester 1 2023
Normal evening Camperdown/Darlington, Sydney
Semester 1 2023
Normal evening Remote

Modes of attendance (MoA)

This refers to the Mode of attendance (MoA) for the unit as it appears when you’re selecting your units in Sydney Student. Find more information about modes of attendance on our website.