Skip to main content

CZ4045 Natural Language Processing

Course Summary

This course teaches about understanding Natural Languages and how Statistical, as well as deep learning models can be built for text summarisation, abstraction, prediction and translation. The course is taught by Prof Aixin for the first half and Prof Joty for the second. Both of them are well versed in the content and the lessons are fairly interesting. Prof Joty works at SalesForce on his own NLP projects so he brings that unique perspectice to the classes. The content covered for this course includes:

  1. Regular Expressions
  2. Words and Inducers
  3. N-grams and Language Models
  4. Part-of-Speech Tagging
  5. Hidden Markov Models
  6. Parsing
  7. Word2Vec
  8. CNNs, RNNs and LSTMs
  9. Seq-2-Seq Models

Workload

Workload is fairly light for most of the semester, this mod is more project heavy as there is no final exam but there are 2 quizzes. There are tutorials as well which the Prof will go through. For the second half, the tutorials have a guided version in case you are lost and need some help completing the assignment.

Projects

There are 2 group projects for this module. The first is more of a research and analysis type of project as the first half of the course mostly deals with understanding natural languages on a whole. Since you have 4-5 persons per group, you can divide-and-conquer the whole project. Prof Aixin is particular about the format of the submission of the report so do use Overleaf as it provides the template that he has specifically requested.

The second project is creating 2 NLP models, one Text Generation and one Named Entity Recognition. This can also be achieved by pairing up in your group and creating the language models. Do remember to take note of what other members in your group are doing so that you can keep on top of the topics that you have learnt.

Things to take note of

As you will need to train language models for this project, be prepared to spend quite some time fiddling with the hyperparameters. It will be time consuming especially if you are using Google Colab as you will not have access to high powered GPUs to make the computations go by faster. If you have your own Gaming PC with at least a RTX 3060, you will be able to save yourself some time.

Conclusion

Overall this module is pretty fun and if you're really interested in persuing NLP in the future, it does give you a glimpse into what it is like to build and run NLP models. In my experience, you can get more practical NLP experience in Neural Networks and Deep Learning as you will be able to test your NLP skills on a real dataset and actually create Seq-2-Seq models. But this course does definitely teach you how to understand and interpret natural languages.