Skip to main content

CZ4034 Information Retrieval

Course Summary

This module mainly focusses on how textual data or documents containing textual data is processed, indexed, stored and retrieved by systems in an efficient manner that answers the query that the user has. It covers the following topics:

  1. Boolean Retrieval
    1. Term-Doc incidence matrix
    2. Inverted Matrix
    3. Query Processing
    4. Phrase Queries
  2. Tolerant Retrieval
    1. Tokenisation and Lemmatisation
    2. Stemming
    3. Wild-card Queries
    4. Edit Distance
  3. Ranked Retrieval
    1. TF-IDF
  4. Efficient Retrieval
    1. Cosine Score
    2. Static Quality Score
    3. Cluster Pruning
    4. Tiered Index
  5. Enhanced Retrieval
    1. F1 Measure
    2. Inter-Judge Agreement
    3. Rocchio Algorithm
  6. Classification
    1. Naive-Bayes
    2. Chi-Square Feature Selection
    3. kNN
    4. SVM
  7. Clustering
    1. k-Means
    2. Hierarchical Clustering
  8. Web Search
    1. Search Engine Optimisation
    2. Web Crawling
    3. Anchor Text
    4. PageRank

Workload

All the lessons and tutorials are pre-recorded and available as YouTube links so you can finish learning the course at your own pace. You can still go for the physical lectures if you have questions. There will be one midterm assessment that is MCQ and open book but you will be time constrained. You can technically work in groups to achieve better grades but the question order will be randomised so collaboration will be a bit harder.

Projects

There is one final project at the end where you will have to source your own data, index it and serve it as a search engine for some practical use case. You will have to create a website that shows off as many topics that you have learnt as possible such as spell-check for the query. You will also have to create a Classification model to classify the sentiment and polarity of your data. This will be given to you towards the end of the semester.

Things to take note of

Grouping for the project is randomised so you will not be able to choose your teammates. The final exam has a very rigid set of questions that they can ask so it is easy to be mechanical and solve as many questions as you can to understand the calculation questions and execute during the paper. The explanation questions will require some thorough understanding of the concepts.

Conclusion

This is a module that is easy to get a decent passing grade (B/B+) but harder to get the upper tier as you will really have to put up an excellent project and score well for the Final.