COMP4446: Natural Language Processing

2026 unit information

This unit introduces computational linguistics and the statistical techniques and algorithms used to automatically process natural languages (such as English or Chinese). It will review the core statistics and information theory, and the basic linguistics, required to understand statistical natural language processing (NLP). Statistical NLP is used in a wide range of applications, including information retrieval and extraction; question answering; machine translation; and classifying and clustering of documents. This unit will explore the key challenges of natural language to computational modelling, and the state of the art approaches to the key NLP sub-tasks, including tokenisation, morphological analysis, word sense representation, part-of-speech tagging, named entity recognition and other information extraction, text categorisation, phrase structure parsing and dependency parsing. You will implement many of these sub-tasks in labs and assignments. The unit will also investigate the annotation process that is central to creating training data for statistical NLP systems. You will annotate data as part of completing a real-world NLP task.

Unit details and rules

Managing faculty or University school:

Engineering

Details

Study level	Undergraduate
Academic unit	Computer Science
Credit points	6

Enrolment rules

Prerequisites: ?	DATA3888 or COMP3888 or COMP3988 or CSEC3888 or ISYS3888 or SOFT3888 or ENGG3112 or SCPU3001
Corequisites: ?	None
Prohibitions: ?	COMP5046
Assumed knowledge: ?	A major in a computer science area. Knowledge of an OO programming language as covered in INFO1113

Learning outcomes

At the completion of this unit, you should be able to:

LO1. apply basic linguistic knowledge to identify properties of text
LO2. understand the internal architecture of language models including the purpose of each component
LO3. implement and train machine learning based systems for solving natural language tasks
LO4. evaluate the performance of natural language processing systems
LO5. mplement and debug a large NLP system in a collaborative manner
LO6. annotate data using appropriate quality control methods
LO7. identify ethical concerns in NLP systems and ways to mitigate those issues

Unit availability

This section lists the session, attendance modes and locations the unit is available in. There is a unit outline for each of the unit availabilities, which gives you information about the unit including assessment details and a schedule of weekly activities.

The outline is published 2 weeks before the first day of teaching. You can look at previous outlines for a guide to the details of a unit.

Current year
Previous years

Session	MoA ?	Location	Outline ?
Semester 1 2026	Normal day	Camperdown/Darlington, Sydney	View

Session	MoA ?	Location	Outline ?
Semester 1 2023	Normal evening	Camperdown/Darlington, Sydney	View
Semester 1 2023	Normal evening	Remote	View
Semester 1 2024	Normal day	Camperdown/Darlington, Sydney	View
Semester 1 2025	Normal day	Camperdown/Darlington, Sydney	View

Find your current year census dates

Modes of attendance (MoA)

This refers to the Mode of attendance (MoA) for the unit as it appears when you’re selecting your units in Sydney Student. Find more information about modes of attendance on our website.

Disclaimer

Important: the University of Sydney regularly reviews units of study and reserves the right to change the units of study available annually. To stay up to date on available study options, including unit of study details and availability, refer to the relevant handbook.

To help you understand common terms that we use at the University, we offer an online glossary.