SC4020 Data Analytics and Mining
Course Summary
This course teaches you how you can store and process data using various techniques to draw insight from the data so that you can make actionable decisions. The course is taught by Prof Gao Cong for the first half and Prof Guosheng for the second half. The content covered includes:
- Map-Reduce
- Data Types, Processing, Storage and Warehousing
- Association Rules
- Finding Frequent Itemsets
- A-Priori and PCY Algorithms
- Pattern Growth
- Sequential Pattern Mining
- Classification
- Decision Tree
- Ensemble Classifier
- Association Rule Classifier
- Similarity Search
- Nearest Neighbour
- Local Sensitive Hashing
- Inverted File Index
- Product Quantisation
- Clustering
- K-means
- Hierarchal
- DBSCAN
- Mean Shift
- Kernel Density Estimation
- Link Analysis
- Graph Clustering
- Graph Neural Network
Workload
The workload for this course is light throughout the course as there are no quizzes. There are 2 Assignments and 1 Final. However, it may take some time for you to understand the content as this is a very dry, content heavy course.
Projects
There are 2 group projects for this course. The first is to create a M2 Classifier and the second is a Report where you have to compare 2 algorithms from a given list of topics. You will then have to test the algorithms on a dataset to give emperical evidence. The project can be very time consuming as you will have to read from research papers and implement complex code on your own.
Things to take note of
Prof Gao Cong can be a bit tricky to understand at times, which makes it a bit hard to digest the dry content.
Conclusion
Only take this module if you are really interested in Data Engineering as compared to Data Science. Some of the content like Map-Reduce is outdated as there are better ways to parallelise data processing. The only interesting parts are Link Analysis and Graph Neural Networks as that is where AI research is headed. However, it is very briefly covered.